You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by Juhani Connolly <ju...@cyberagent.co.jp> on 2013/11/25 09:14:50 UTC

Regarding the adding of additional sinks/sources for various DB's

Hey guys,

What I write here is all just my personal opinion and I'm writing in 
hopes of starting a discussion and/or getting feedback. I know I've not 
been very active on the project recently(due to other engagements) but 
do still want it to succeed and hope to find more time for it eventually.

Right now I see new/active issues for the addition of Redis and Kafka 
sinks, and while they're nice features, I'm personally concerned about 
feature bloat of the project. There are dozens of interceptors, sinks 
and sources that can be thought of, but most of them are very specific 
to a specific use-case.

Every time we add a new component we're also committing to maintaining 
it over future releases, even if the original contributor gets too busy 
for it. The more such components get added, the more we will get 
distracted from improving core features and getting rid of issues 
affecting them.

For these reasons I generally haven't submitted components we developed 
for internal use(because they are too specific to our use cases), just 
passing back fixes that fix bugs or apply to the core project.

For these reasons I think we may want to consider either a) being more 
selective regarding additional component submissions or b) make a 
contrib directory to the project which includes the components but 
doesn't guarrantee ongoing maintenance or compatibility.

On the flip side of course, taking approaches like this may discourage 
new contributors and could thus be considered a negative, and if many 
people feel this way they should definitely share their thoughts.

I'd be curious to know what others think, and what direction they hope 
to take the project in the future.

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Bruno Mahé <bm...@apache.org>.

On 11/25/2013 12:14 AM, Juhani Connolly wrote:
> Hey guys,
>
> What I write here is all just my personal opinion and I'm writing in
> hopes of starting a discussion and/or getting feedback. I know I've not
> been very active on the project recently(due to other engagements) but
> do still want it to succeed and hope to find more time for it eventually.
>
> Right now I see new/active issues for the addition of Redis and Kafka
> sinks, and while they're nice features, I'm personally concerned about
> feature bloat of the project. There are dozens of interceptors, sinks
> and sources that can be thought of, but most of them are very specific
> to a specific use-case.
>
> Every time we add a new component we're also committing to maintaining
> it over future releases, even if the original contributor gets too busy
> for it. The more such components get added, the more we will get
> distracted from improving core features and getting rid of issues
> affecting them.
>
> For these reasons I generally haven't submitted components we developed
> for internal use(because they are too specific to our use cases), just
> passing back fixes that fix bugs or apply to the core project.
>
> For these reasons I think we may want to consider either a) being more
> selective regarding additional component submissions or b) make a
> contrib directory to the project which includes the components but
> doesn't guarrantee ongoing maintenance or compatibility.
>
> On the flip side of course, taking approaches like this may discourage
> new contributors and could thus be considered a negative, and if many
> people feel this way they should definitely share their thoughts.
>
> I'd be curious to know what others think, and what direction they hope
> to take the project in the future.

Hi,

I should probably chime in since I submitted the patch for the Redis sink.

I see the arguments about keeping Apache Flume lean, but I am not sure 
their benefits outweigh their costs.

As a user, having Apache Flume able to speak multiple sources and sinks 
is a big plus. Having to shop around for various sources/sinks is more 
troublesome since I have to first find which flavor of a given sink is 
being maintained today, deal with licenses, incompatibilities, mismatch 
versions, upgrades, deployment, not fixed bugs and wondering if this is 
even going to work at all.
Knowing a piece of code is in Apache Flume puts my mind at ease since 
the license is clear, CLA cleared and it has been reviewed. There may be 
some expectations regarding its support and quality, but it should be 
fine as long as it is clearly stated and labeled (See the contrib idea, 
or tagging them with different labels such as "supported", 
"experimental"). This also gives more opportunities for bugs to be fixed 
and therefore having code better maintained, due to the wider audience 
of Apache Flume in comparison to a random small project on github.
Also as a user, I would have to be fairly technical to use a random 
source/sink outside of Apache Flume. I would probably have to build it, 
qualify it against my version of Apache Flume, and package it for 
deployment. Whereas if it is in Apache Flume, it's either already in the 
tarball or already in the package of my favorite Apache Flume distribution.

As a developer, Apache Flume is very flexible since I can pick and 
choose most parts. But if I have to write my own source and/or my own 
sink, I may be tempted to forego Apache Flume altogether and write the 
rest myself for my specific use case.
But if I get to write a source for my use case, I don't have much 
incentive to make it public or to maintain it with the current Apache 
Flume version. I just need to ensure it works for my version of Apache 
Flume. Everything else is just extra work.
Also in the context of a company, I would rather target my source/sink 
to work with one of vendor supported version of Apache Flume, which may 
be different from the latest Apache Flume. I would have no incentive to 
go through the effort of testing it against Apache Flume. If my 
source/sink was in Apache Flume, I would be more interested in 
contributing to Apache Flume since I know the changes would trickle down 
at some point and make my life easier.

As an Apache Bigtop contributor, having all these projects spread around 
scares me. They will all depend against different versions of Apache 
Flume, build in different ways, works in different ways and integrate in 
their own way. Sending patches upstream will also be troublesome since 
now we would have to talk to and work with a lot more people than just 
Apache Flume folks. Each of these people having different schedules and 
ways of working.

In conclusion, I believe having a diverse set of Source/Sink/Channel may 
not be a bad idea. If such piece is not maintained and no-one is willing 
to maintain it, then I don't see why it could not be removed.

In order to prevent a source/sink/channel to rot, besides creating a 
contrib area, we could also do the following
* Tag the component based on their known quality and stability
* Be strict about unit tests
* Maybe require some integration tests also.

Thanks,
Bruno

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Ashish <pa...@gmail.com>.

On Mon, Nov 25, 2013 at 1:44 PM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> Hey guys,
>
> What I write here is all just my personal opinion and I'm writing in hopes
> of starting a discussion and/or getting feedback. I know I've not been very
> active on the project recently(due to other engagements) but do still want
> it to succeed and hope to find more time for it eventually.
>

You have been helping a lot :)


>
> Right now I see new/active issues for the addition of Redis and Kafka
> sinks, and while they're nice features, I'm personally concerned about
> feature bloat of the project. There are dozens of interceptors, sinks and
> sources that can be thought of, but most of them are very specific to a
> specific use-case.
>
> Every time we add a new component we're also committing to maintaining it
> over future releases, even if the original contributor gets too busy for
> it. The more such components get added, the more we will get distracted
> from improving core features and getting rid of issues affecting them.
>

Very true.


>
> For these reasons I generally haven't submitted components we developed
> for internal use(because they are too specific to our use cases), just
> passing back fixes that fix bugs or apply to the core project.
>
> For these reasons I think we may want to consider either a) being more
> selective regarding additional component submissions or b) make a contrib
> directory to the project which includes the components but doesn't
> guarrantee ongoing maintenance or compatibility.
>

+1 Like this idea.


>
> On the flip side of course, taking approaches like this may discourage new
> contributors and could thus be considered a negative, and if many people
> feel this way they should definitely share their thoughts.
>

Being new to Flume, I don't think so that it would discourage contribution,
as long as we have a clear line of thought of what goes where.


>
> I'd be curious to know what others think, and what direction they hope to
> take the project in the future.
>


The richness of Source/Sink implementations available with Flume is a big
plus. This is an important discussion, I would let the Core Flume Dev's to
discuss further on this.



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Steve Morin <st...@stevemorin.com>.

Frank,
  I would definitely vote for Kafka source/sink inclusion into flume.
-Steve


On Fri, Nov 29, 2013 at 7:11 AM, Frank Yao <ba...@gmail.com> wrote:

> I'm the 'guys' who is working on Kafka source/sinks:).
>
> A feature for popular, fast-growing and mature products is necessary to
> merge into Flume. Why?
> a) sources/sinks of mature products are really a motivation to flume users.
> b) developers have willing to add new sources/sinks to flume.
>
> For a), early there is a developer said, ' Having to shop around for
> various sources/sinks is more troublesome since I have to first find which
> flavor of a given sink is being maintained today, deal with licenses,
> incompatibilities, mismatch versions, upgrades, deployment, not fixed bugs
> and wondering if this is even going to work at all.' I thinks this is
> common for people who want to use flume but cannot found what he wanted at
> first.
> For b), if developers give plugins to flume and are rejected only because
> of keeping Flume lean, developers will lose their passion of contributing
> to Flume.
>
> I think if developers made sources/sinks for Flume, and he/she thought the
> sources/sinks were in great need by Flume users, he/she need to request a
> vote by committers, if most committers think it's necessary, then merge
> these to Flume. If not, then move these to  Flume-contrib projects like
> what ElasticSearch does.
>
>
>
> -----------------
>
> 姚仁捷 Frank Yao
> @超大杯摩卡星冰乐 <http://weibo.com/frankymryao>
> http://baniu.me
> Vipshop, Shanghai
>
>
>
>
> On Fri, Nov 29, 2013 at 3:17 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > If you think "ecosystem", and an OSS project like Flume should very much
> > think ecosystem, then leaving things on Github, etc. probably makes more
> > sense.  Over the years (now over a decade!) I've witnessed what happens
> > with contrib/-type approach - authors need to have access to maintain
> their
> > stuff, make it work with the build system changes, make it work before
> > released, etc. etc,, which is all hard, and very often you can't just
> give
> > contrib authors Apache commit rights.  So instead of trying to pull
> > everything in, one should focus on *making developer-friendly core/APIs".
> >  Developers will then build tools that work with this core and naturally
> > create a rich ecosystem of tools around it.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> >
> > On Nov 29, 2013 12:14 AM, "Jeremy Karlson" <je...@gmail.com>
> > wrote:
> >
> > > As someone who just developed a sink, let me add my two cents.
> > >
> > > If the intention is to separate “core Flume” from second class citizens
> > > like myself ( ;-) ), a contrib module only makes sense if those
> > > contributors can manage fixes and commit to their modules themselves.
> > >  Waiting for core developers to apply changes to modules they don’t
> want
> > to
> > > work on will just leave maintainers like myself annoyed at waiting and
> > core
> > > contributors annoyed at having to do it.  I think you’d have to hand
> out
> > > commit abilities to several people for there to be smiles all round.
> > >
> > > If you don’t want to or can’t do that (understandable), maybe just let
> > > everyone do their own module management on GitHub or whatever, and
> > provide
> > > a page that links to “add on” modules.  (This is the approach
> > Elasticsearch
> > > takes, I think.)
> > >
> > > -- Jeremy
> > >
> > >
> > > On Nov 28, 2013, at 11:06, Hari Shreedharan <hshreedharan@cloudera.com
> >
> > > wrote:
> > >
> > > > Juhani and others,
> > > >
> > > > I agree that it does make sense to add a contrib module to flume
> where
> > > > non-hadoopy stuff can go. I will start a discussion on this early
> next
> > > week.
> > > >
> > > > Hari
> > > >
> > > > On Thursday, November 28, 2013, Steve Morin wrote:
> > > >
> > > >> Israel,
> > > >> I guess my questions is why the suggestion to use the elastic search
> > > >> model, is there something you see that's not working?
> > > >> -Steve
> > > >>
> > > >>
> > > >> On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <israel@aicer.org
> > > <javascript:;>>
> > > >> wrote:
> > > >>
> > > >>> I think we can take a page of out the ElasticSearch playbook.
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
> > > >>>
> > > >>> I like the model they follow.
> > > >>>
> > > >>> The Flume architecture makes it easy for plugins at any layer
> > (source,
> > > >>> interceptor, sink etc)
> > > >>>
> > > >>> Contributors can host plugins on github and manage the
> documentation
> > > and
> > > >>> maintenance of the plugin.
> > > >>>
> > > >>> Others can chip it when possible to improve or maintain the
> plugins.
> > > >>>
> > > >>> This will still allow new features to the project without
> necessarily
> > > >>> meaning that Flume committers are on the hook for maintaining it.
> > > >>>
> > > >>>
> > > >>>
> > > >>> *Author and Instructor for the Upcoming Book and Lecture Series*
> > > >>> *Massive Log Data Aggregation, Processing, Searching and
> > Visualization
> > > >> with
> > > >>> Open Source Software*
> > > >>> *http://massivelogdata.com <http://massivelogdata.com>*
> > > >>>
> > > >>>
> > > >>> On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
> > > >>> juhani_connolly@cyberagent.co.jp <javascript:;>> wrote:
> > > >>>
> > > >>>> Hey guys,
> > > >>>>
> > > >>>> What I write here is all just my personal opinion and I'm writing
> in
> > > >>> hopes
> > > >>>> of starting a discussion and/or getting feedback. I know I've not
> > been
> > > >>> very
> > > >>>> active on the project recently(due to other engagements) but do
> > still
> > > >>> want
> > > >>>> it to succeed and hope to find more time for it eventually.
> > > >>>>
> > > >>>> Right now I see new/active issues for the addition of Redis and
> > Kafka
> > > >>>> sinks, and while they're nice features, I'm personally concerned
> > about
> > > >>>> feature bloat of the project. There are dozens of interceptors,
> > sinks
> > > >> and
> > > >>>> sources that can be thought of, but most of them are very specific
> > to
> > > a
> > > >>>> specific use-case.
> > > >>>>
> > > >>>> Every time we add a new component we're also committing to
> > maintaining
> > > >> it
> > > >>>> over future releases, even if the original contributor gets too
> busy
> > > >> for
> > > >>>> it. The more such components get added, the more we will get
> > > distracted
> > > >>>> from improving core features and getting rid of issues affecting
> > them.
> > > >>>>
> > > >>>> For these reasons I generally haven't submitted components we
> > > developed
> > > >>>> for internal use(because they are too specific to our use cases),
> > just
> > > >>>> passing back fixes that fix bugs or apply to the core project.
> > > >>>>
> > > >>>> For these reasons I think we may want to consider either a) being
> > more
> > > >>>> selective regarding additional component submissions or b) make a
> > > >> contrib
> > > >>>> directory to the project which includes the components but doesn't
> > > >>>> guarrantee ongoing maintenance or compatibility.
> > > >>>>
> > > >>>> On the flip side of course, taking approaches like this may
> > discourage
> > > >>> new
> > > >>>> contributors and could thus be considered a negative, and if many
> > > >> people
> > > >>>> feel this way they should definitely share their thoughts.
> > > >>>>
> > > >>>> I'd be curious to know what others think, and what direction they
> > hope
> > > >> to
> > > >>>> take the project in the future.
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Bruno Mahé <bm...@apache.org>.

See inline.

On 12/01/2013 05:31 PM, Juhani Connolly wrote:
> Commenting inline
>
> On 11/30/2013 09:03 AM, Bruno Mahé wrote:
>>
>> If a component gets developed outside of Apache, what is the cost to
>> integrate it? How would that work? I assume one would have to go
>> through a code grant. If so, would a developer have to go track all
>> past contributors to agree on the code grant terms? Or should a CLA be
>> prepared in case of such event?
>>
>>
>> Why is it bad to get a new source/sink for the new, fancy, FooBrBaz
>> data store? What if it does take off? I would contend that the success
>> of that data store does not matter as long as there is a community
>> interested in it.
>> And why is it so costly to include it when it is so cheap to remove it
>> when no one cares about it anymore?
>>
> We can't just "remove it when no one cares about it". If it goes into
> the main trunk, we're committing to providing the feature, at least
> until a major version change, and even then it would generally only be
> phased out if it gets deprecated by another equivalent component.

Why cannot we just remove it when no one cares about it? If the 
community does not care enough to maintain it, there is no reason to 
keep it.
If it is in trunk then it has not been released yet and it is fair game. 
I would not see any issue removing any part that is not in a releasable 
state and no one willing to maintain such part before a release. From my 
point of view, if it is in trunk and not released, then it can be 
completely changed or even dropped at any time.
If it is part of a release then keeping it for bug fix releases should 
not really matter since a status quo should be achievable without much 
troubles since Apache Flume APIs should remain stable. And since we are 
talking about bug fixes, there is no reasons to change anything in it 
besides bug fixes contributed by interested parties (commiters or not).

And why must there be another equivalent component before we can 
deprecate one?
Is such policy stated anywhere?
If people care that much about such component then someone will 
volunteer to help maintain it. Otherwise the community is not that much 
interested in it.

In any case the code will not be lost. Anyone will be one git/svn 
command away from bringing it back.

>>
>> Including people from the outer layers also increases the chances of
>> getting them interested in the core of Apache Flume. They will most
>> likely notice something to improve somewhere else and start helping in
>> core parts. Even more so if they are going through all the steps to
>> get their code in Apache Flume.
> I agree with this, and it's the strongest argument for putting new stuff
> in. Of course if these people just contribute their component and then
> leave it for others to maintain, things get more difficult.

Provided we do not hesitate to remove unmaintained components sometimes, 
this would be a non issue. This would enable us to openly accept 
interested community members while taking out unmaintained code.
Otherwise I would agree with your concerns.

>> Imho, the cost on passing on possible contributors is quite high
>> comparing to deleting unmaintained parts. Not just for the core Apache
>> Flume developers, but for the users in general as well.
>> Thanks,
>> Bruno
>>
>>
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

Commenting inline

On 11/30/2013 09:03 AM, Bruno Mahé wrote:
>
> If a component gets developed outside of Apache, what is the cost to 
> integrate it? How would that work? I assume one would have to go 
> through a code grant. If so, would a developer have to go track all 
> past contributors to agree on the code grant terms? Or should a CLA be 
> prepared in case of such event?
>
>
> Why is it bad to get a new source/sink for the new, fancy, FooBrBaz 
> data store? What if it does take off? I would contend that the success 
> of that data store does not matter as long as there is a community 
> interested in it.
> And why is it so costly to include it when it is so cheap to remove it 
> when no one cares about it anymore?
>
We can't just "remove it when no one cares about it". If it goes into 
the main trunk, we're committing to providing the feature, at least 
until a major version change, and even then it would generally only be 
phased out if it gets deprecated by another equivalent component.
>
> Including people from the outer layers also increases the chances of 
> getting them interested in the core of Apache Flume. They will most 
> likely notice something to improve somewhere else and start helping in 
> core parts. Even more so if they are going through all the steps to 
> get their code in Apache Flume.
I agree with this, and it's the strongest argument for putting new stuff 
in. Of course if these people just contribute their component and then 
leave it for others to maintain, things get more difficult.
> Imho, the cost on passing on possible contributors is quite high 
> comparing to deleting unmaintained parts. Not just for the core Apache 
> Flume developers, but for the users in general as well.
> Thanks,
> Bruno
>
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Bruno Mahé <bm...@apache.org>.

On 11/29/2013 08:17 AM, Otis Gospodnetic wrote:
> If I understood this correctly, then I think you can also think of this
> from the opposite direction.  Instead of moving sources/sinks OUT of Flume,
> let them develop outside Flume, and if they prove to be good, in high
> demand, often used with Flume, or in need of "adoption" by core Flume
> developers, then BRING IN such sources/sinks and, optionally(?), their
> authors with them.  Maybe this way you can have the best of both worlds.
>
> If I develop source/sink for the new, fancy, FooBarBaz data store, can I
> get it in Flume?  No.  and that's good, because if FooBarBaz doesn't take
> off, who'll want to maintain it?
> If I develop source/sink for Kafka, which is known to be very popular and
> often used with Flume, and there are either core Flume developers
> interested in maintaining this source/sink, or if there are external
> developers who have been working on this outside Apache (i.e. the chances
> are they'll follow the project to Apache and continue contributing to it,
> eventually earning the committer rights), then this source/sink for Kafka
> should be considered for adoption by Flume.
>
> This way you let the external ecosystem live on, develop independently,
> compete for user and developer adoption, and once good sources/sinks emerge
> from there, they can be moved under Flume if everyone involved agrees
> that's the right thing to do.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>

If a component gets developed outside of Apache, what is the cost to 
integrate it? How would that work? I assume one would have to go through 
a code grant. If so, would a developer have to go track all past 
contributors to agree on the code grant terms? Or should a CLA be 
prepared in case of such event?

Why is it bad to get a new source/sink for the new, fancy, FooBrBaz data 
store? What if it does take off? I would contend that the success of 
that data store does not matter as long as there is a community 
interested in it.
And why is it so costly to include it when it is so cheap to remove it 
when no one cares about it anymore?

Including people from the outer layers also increases the chances of 
getting them interested in the core of Apache Flume. They will most 
likely notice something to improve somewhere else and start helping in 
core parts. Even more so if they are going through all the steps to get 
their code in Apache Flume.
Imho, the cost on passing on possible contributors is quite high 
comparing to deleting unmaintained parts. Not just for the core Apache 
Flume developers, but for the users in general as well.

Thanks,
Bruno

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Otis Gospodnetic <ot...@gmail.com>.

If I understood this correctly, then I think you can also think of this
from the opposite direction.  Instead of moving sources/sinks OUT of Flume,
let them develop outside Flume, and if they prove to be good, in high
demand, often used with Flume, or in need of "adoption" by core Flume
developers, then BRING IN such sources/sinks and, optionally(?), their
authors with them.  Maybe this way you can have the best of both worlds.

If I develop source/sink for the new, fancy, FooBarBaz data store, can I
get it in Flume?  No.  and that's good, because if FooBarBaz doesn't take
off, who'll want to maintain it?
If I develop source/sink for Kafka, which is known to be very popular and
often used with Flume, and there are either core Flume developers
interested in maintaining this source/sink, or if there are external
developers who have been working on this outside Apache (i.e. the chances
are they'll follow the project to Apache and continue contributing to it,
eventually earning the committer rights), then this source/sink for Kafka
should be considered for adoption by Flume.

This way you let the external ecosystem live on, develop independently,
compete for user and developer adoption, and once good sources/sinks emerge
from there, they can be moved under Flume if everyone involved agrees
that's the right thing to do.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Nov 29, 2013 at 10:11 AM, Frank Yao <ba...@gmail.com> wrote:

> I'm the 'guys' who is working on Kafka source/sinks:).
>
> A feature for popular, fast-growing and mature products is necessary to
> merge into Flume. Why?
> a) sources/sinks of mature products are really a motivation to flume users.
> b) developers have willing to add new sources/sinks to flume.
>
> For a), early there is a developer said, ' Having to shop around for
> various sources/sinks is more troublesome since I have to first find which
> flavor of a given sink is being maintained today, deal with licenses,
> incompatibilities, mismatch versions, upgrades, deployment, not fixed bugs
> and wondering if this is even going to work at all.' I thinks this is
> common for people who want to use flume but cannot found what he wanted at
> first.
> For b), if developers give plugins to flume and are rejected only because
> of keeping Flume lean, developers will lose their passion of contributing
> to Flume.
>
> I think if developers made sources/sinks for Flume, and he/she thought the
> sources/sinks were in great need by Flume users, he/she need to request a
> vote by committers, if most committers think it's necessary, then merge
> these to Flume. If not, then move these to  Flume-contrib projects like
> what ElasticSearch does.
>
>
>
> -----------------
>
> 姚仁捷 Frank Yao
> @超大杯摩卡星冰乐 <http://weibo.com/frankymryao>
> http://baniu.me
> Vipshop, Shanghai
>
>
>
>
> On Fri, Nov 29, 2013 at 3:17 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > If you think "ecosystem", and an OSS project like Flume should very much
> > think ecosystem, then leaving things on Github, etc. probably makes more
> > sense.  Over the years (now over a decade!) I've witnessed what happens
> > with contrib/-type approach - authors need to have access to maintain
> their
> > stuff, make it work with the build system changes, make it work before
> > released, etc. etc,, which is all hard, and very often you can't just
> give
> > contrib authors Apache commit rights.  So instead of trying to pull
> > everything in, one should focus on *making developer-friendly core/APIs".
> >  Developers will then build tools that work with this core and naturally
> > create a rich ecosystem of tools around it.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> >
> > On Nov 29, 2013 12:14 AM, "Jeremy Karlson" <je...@gmail.com>
> > wrote:
> >
> > > As someone who just developed a sink, let me add my two cents.
> > >
> > > If the intention is to separate “core Flume” from second class citizens
> > > like myself ( ;-) ), a contrib module only makes sense if those
> > > contributors can manage fixes and commit to their modules themselves.
> > >  Waiting for core developers to apply changes to modules they don’t
> want
> > to
> > > work on will just leave maintainers like myself annoyed at waiting and
> > core
> > > contributors annoyed at having to do it.  I think you’d have to hand
> out
> > > commit abilities to several people for there to be smiles all round.
> > >
> > > If you don’t want to or can’t do that (understandable), maybe just let
> > > everyone do their own module management on GitHub or whatever, and
> > provide
> > > a page that links to “add on” modules.  (This is the approach
> > Elasticsearch
> > > takes, I think.)
> > >
> > > -- Jeremy
> > >
> > >
> > > On Nov 28, 2013, at 11:06, Hari Shreedharan <hshreedharan@cloudera.com
> >
> > > wrote:
> > >
> > > > Juhani and others,
> > > >
> > > > I agree that it does make sense to add a contrib module to flume
> where
> > > > non-hadoopy stuff can go. I will start a discussion on this early
> next
> > > week.
> > > >
> > > > Hari
> > > >
> > > > On Thursday, November 28, 2013, Steve Morin wrote:
> > > >
> > > >> Israel,
> > > >> I guess my questions is why the suggestion to use the elastic search
> > > >> model, is there something you see that's not working?
> > > >> -Steve
> > > >>
> > > >>
> > > >> On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <israel@aicer.org
> > > <javascript:;>>
> > > >> wrote:
> > > >>
> > > >>> I think we can take a page of out the ElasticSearch playbook.
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
> > > >>>
> > > >>> I like the model they follow.
> > > >>>
> > > >>> The Flume architecture makes it easy for plugins at any layer
> > (source,
> > > >>> interceptor, sink etc)
> > > >>>
> > > >>> Contributors can host plugins on github and manage the
> documentation
> > > and
> > > >>> maintenance of the plugin.
> > > >>>
> > > >>> Others can chip it when possible to improve or maintain the
> plugins.
> > > >>>
> > > >>> This will still allow new features to the project without
> necessarily
> > > >>> meaning that Flume committers are on the hook for maintaining it.
> > > >>>
> > > >>>
> > > >>>
> > > >>> *Author and Instructor for the Upcoming Book and Lecture Series*
> > > >>> *Massive Log Data Aggregation, Processing, Searching and
> > Visualization
> > > >> with
> > > >>> Open Source Software*
> > > >>> *http://massivelogdata.com <http://massivelogdata.com>*
> > > >>>
> > > >>>
> > > >>> On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
> > > >>> juhani_connolly@cyberagent.co.jp <javascript:;>> wrote:
> > > >>>
> > > >>>> Hey guys,
> > > >>>>
> > > >>>> What I write here is all just my personal opinion and I'm writing
> in
> > > >>> hopes
> > > >>>> of starting a discussion and/or getting feedback. I know I've not
> > been
> > > >>> very
> > > >>>> active on the project recently(due to other engagements) but do
> > still
> > > >>> want
> > > >>>> it to succeed and hope to find more time for it eventually.
> > > >>>>
> > > >>>> Right now I see new/active issues for the addition of Redis and
> > Kafka
> > > >>>> sinks, and while they're nice features, I'm personally concerned
> > about
> > > >>>> feature bloat of the project. There are dozens of interceptors,
> > sinks
> > > >> and
> > > >>>> sources that can be thought of, but most of them are very specific
> > to
> > > a
> > > >>>> specific use-case.
> > > >>>>
> > > >>>> Every time we add a new component we're also committing to
> > maintaining
> > > >> it
> > > >>>> over future releases, even if the original contributor gets too
> busy
> > > >> for
> > > >>>> it. The more such components get added, the more we will get
> > > distracted
> > > >>>> from improving core features and getting rid of issues affecting
> > them.
> > > >>>>
> > > >>>> For these reasons I generally haven't submitted components we
> > > developed
> > > >>>> for internal use(because they are too specific to our use cases),
> > just
> > > >>>> passing back fixes that fix bugs or apply to the core project.
> > > >>>>
> > > >>>> For these reasons I think we may want to consider either a) being
> > more
> > > >>>> selective regarding additional component submissions or b) make a
> > > >> contrib
> > > >>>> directory to the project which includes the components but doesn't
> > > >>>> guarrantee ongoing maintenance or compatibility.
> > > >>>>
> > > >>>> On the flip side of course, taking approaches like this may
> > discourage
> > > >>> new
> > > >>>> contributors and could thus be considered a negative, and if many
> > > >> people
> > > >>>> feel this way they should definitely share their thoughts.
> > > >>>>
> > > >>>> I'd be curious to know what others think, and what direction they
> > hope
> > > >> to
> > > >>>> take the project in the future.
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Frank Yao <ba...@gmail.com>.

I'm the 'guys' who is working on Kafka source/sinks:).

A feature for popular, fast-growing and mature products is necessary to
merge into Flume. Why?
a) sources/sinks of mature products are really a motivation to flume users.
b) developers have willing to add new sources/sinks to flume.

For a), early there is a developer said, ' Having to shop around for
various sources/sinks is more troublesome since I have to first find which
flavor of a given sink is being maintained today, deal with licenses,
incompatibilities, mismatch versions, upgrades, deployment, not fixed bugs
and wondering if this is even going to work at all.' I thinks this is
common for people who want to use flume but cannot found what he wanted at
first.
For b), if developers give plugins to flume and are rejected only because
of keeping Flume lean, developers will lose their passion of contributing
to Flume.

I think if developers made sources/sinks for Flume, and he/she thought the
sources/sinks were in great need by Flume users, he/she need to request a
vote by committers, if most committers think it's necessary, then merge
these to Flume. If not, then move these to  Flume-contrib projects like
what ElasticSearch does.



-----------------

姚仁捷 Frank Yao
@超大杯摩卡星冰乐 <http://weibo.com/frankymryao>
http://baniu.me
Vipshop, Shanghai




On Fri, Nov 29, 2013 at 3:17 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> If you think "ecosystem", and an OSS project like Flume should very much
> think ecosystem, then leaving things on Github, etc. probably makes more
> sense.  Over the years (now over a decade!) I've witnessed what happens
> with contrib/-type approach - authors need to have access to maintain their
> stuff, make it work with the build system changes, make it work before
> released, etc. etc,, which is all hard, and very often you can't just give
> contrib authors Apache commit rights.  So instead of trying to pull
> everything in, one should focus on *making developer-friendly core/APIs".
>  Developers will then build tools that work with this core and naturally
> create a rich ecosystem of tools around it.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
> On Nov 29, 2013 12:14 AM, "Jeremy Karlson" <je...@gmail.com>
> wrote:
>
> > As someone who just developed a sink, let me add my two cents.
> >
> > If the intention is to separate “core Flume” from second class citizens
> > like myself ( ;-) ), a contrib module only makes sense if those
> > contributors can manage fixes and commit to their modules themselves.
> >  Waiting for core developers to apply changes to modules they don’t want
> to
> > work on will just leave maintainers like myself annoyed at waiting and
> core
> > contributors annoyed at having to do it.  I think you’d have to hand out
> > commit abilities to several people for there to be smiles all round.
> >
> > If you don’t want to or can’t do that (understandable), maybe just let
> > everyone do their own module management on GitHub or whatever, and
> provide
> > a page that links to “add on” modules.  (This is the approach
> Elasticsearch
> > takes, I think.)
> >
> > -- Jeremy
> >
> >
> > On Nov 28, 2013, at 11:06, Hari Shreedharan <hs...@cloudera.com>
> > wrote:
> >
> > > Juhani and others,
> > >
> > > I agree that it does make sense to add a contrib module to flume where
> > > non-hadoopy stuff can go. I will start a discussion on this early next
> > week.
> > >
> > > Hari
> > >
> > > On Thursday, November 28, 2013, Steve Morin wrote:
> > >
> > >> Israel,
> > >> I guess my questions is why the suggestion to use the elastic search
> > >> model, is there something you see that's not working?
> > >> -Steve
> > >>
> > >>
> > >> On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <israel@aicer.org
> > <javascript:;>>
> > >> wrote:
> > >>
> > >>> I think we can take a page of out the ElasticSearch playbook.
> > >>>
> > >>>
> > >>>
> > >>
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
> > >>>
> > >>> I like the model they follow.
> > >>>
> > >>> The Flume architecture makes it easy for plugins at any layer
> (source,
> > >>> interceptor, sink etc)
> > >>>
> > >>> Contributors can host plugins on github and manage the documentation
> > and
> > >>> maintenance of the plugin.
> > >>>
> > >>> Others can chip it when possible to improve or maintain the plugins.
> > >>>
> > >>> This will still allow new features to the project without necessarily
> > >>> meaning that Flume committers are on the hook for maintaining it.
> > >>>
> > >>>
> > >>>
> > >>> *Author and Instructor for the Upcoming Book and Lecture Series*
> > >>> *Massive Log Data Aggregation, Processing, Searching and
> Visualization
> > >> with
> > >>> Open Source Software*
> > >>> *http://massivelogdata.com <http://massivelogdata.com>*
> > >>>
> > >>>
> > >>> On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
> > >>> juhani_connolly@cyberagent.co.jp <javascript:;>> wrote:
> > >>>
> > >>>> Hey guys,
> > >>>>
> > >>>> What I write here is all just my personal opinion and I'm writing in
> > >>> hopes
> > >>>> of starting a discussion and/or getting feedback. I know I've not
> been
> > >>> very
> > >>>> active on the project recently(due to other engagements) but do
> still
> > >>> want
> > >>>> it to succeed and hope to find more time for it eventually.
> > >>>>
> > >>>> Right now I see new/active issues for the addition of Redis and
> Kafka
> > >>>> sinks, and while they're nice features, I'm personally concerned
> about
> > >>>> feature bloat of the project. There are dozens of interceptors,
> sinks
> > >> and
> > >>>> sources that can be thought of, but most of them are very specific
> to
> > a
> > >>>> specific use-case.
> > >>>>
> > >>>> Every time we add a new component we're also committing to
> maintaining
> > >> it
> > >>>> over future releases, even if the original contributor gets too busy
> > >> for
> > >>>> it. The more such components get added, the more we will get
> > distracted
> > >>>> from improving core features and getting rid of issues affecting
> them.
> > >>>>
> > >>>> For these reasons I generally haven't submitted components we
> > developed
> > >>>> for internal use(because they are too specific to our use cases),
> just
> > >>>> passing back fixes that fix bugs or apply to the core project.
> > >>>>
> > >>>> For these reasons I think we may want to consider either a) being
> more
> > >>>> selective regarding additional component submissions or b) make a
> > >> contrib
> > >>>> directory to the project which includes the components but doesn't
> > >>>> guarrantee ongoing maintenance or compatibility.
> > >>>>
> > >>>> On the flip side of course, taking approaches like this may
> discourage
> > >>> new
> > >>>> contributors and could thus be considered a negative, and if many
> > >> people
> > >>>> feel this way they should definitely share their thoughts.
> > >>>>
> > >>>> I'd be curious to know what others think, and what direction they
> hope
> > >> to
> > >>>> take the project in the future.
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Otis Gospodnetic <ot...@gmail.com>.

If you think "ecosystem", and an OSS project like Flume should very much
think ecosystem, then leaving things on Github, etc. probably makes more
sense.  Over the years (now over a decade!) I've witnessed what happens
with contrib/-type approach - authors need to have access to maintain their
stuff, make it work with the build system changes, make it work before
released, etc. etc,, which is all hard, and very often you can't just give
contrib authors Apache commit rights.  So instead of trying to pull
everything in, one should focus on *making developer-friendly core/APIs".
 Developers will then build tools that work with this core and naturally
create a rich ecosystem of tools around it.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/




On Nov 29, 2013 12:14 AM, "Jeremy Karlson" <je...@gmail.com> wrote:

> As someone who just developed a sink, let me add my two cents.
>
> If the intention is to separate “core Flume” from second class citizens
> like myself ( ;-) ), a contrib module only makes sense if those
> contributors can manage fixes and commit to their modules themselves.
>  Waiting for core developers to apply changes to modules they don’t want to
> work on will just leave maintainers like myself annoyed at waiting and core
> contributors annoyed at having to do it.  I think you’d have to hand out
> commit abilities to several people for there to be smiles all round.
>
> If you don’t want to or can’t do that (understandable), maybe just let
> everyone do their own module management on GitHub or whatever, and provide
> a page that links to “add on” modules.  (This is the approach Elasticsearch
> takes, I think.)
>
> -- Jeremy
>
>
> On Nov 28, 2013, at 11:06, Hari Shreedharan <hs...@cloudera.com>
> wrote:
>
> > Juhani and others,
> >
> > I agree that it does make sense to add a contrib module to flume where
> > non-hadoopy stuff can go. I will start a discussion on this early next
> week.
> >
> > Hari
> >
> > On Thursday, November 28, 2013, Steve Morin wrote:
> >
> >> Israel,
> >> I guess my questions is why the suggestion to use the elastic search
> >> model, is there something you see that's not working?
> >> -Steve
> >>
> >>
> >> On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <israel@aicer.org
> <javascript:;>>
> >> wrote:
> >>
> >>> I think we can take a page of out the ElasticSearch playbook.
> >>>
> >>>
> >>>
> >>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
> >>>
> >>> I like the model they follow.
> >>>
> >>> The Flume architecture makes it easy for plugins at any layer (source,
> >>> interceptor, sink etc)
> >>>
> >>> Contributors can host plugins on github and manage the documentation
> and
> >>> maintenance of the plugin.
> >>>
> >>> Others can chip it when possible to improve or maintain the plugins.
> >>>
> >>> This will still allow new features to the project without necessarily
> >>> meaning that Flume committers are on the hook for maintaining it.
> >>>
> >>>
> >>>
> >>> *Author and Instructor for the Upcoming Book and Lecture Series*
> >>> *Massive Log Data Aggregation, Processing, Searching and Visualization
> >> with
> >>> Open Source Software*
> >>> *http://massivelogdata.com <http://massivelogdata.com>*
> >>>
> >>>
> >>> On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
> >>> juhani_connolly@cyberagent.co.jp <javascript:;>> wrote:
> >>>
> >>>> Hey guys,
> >>>>
> >>>> What I write here is all just my personal opinion and I'm writing in
> >>> hopes
> >>>> of starting a discussion and/or getting feedback. I know I've not been
> >>> very
> >>>> active on the project recently(due to other engagements) but do still
> >>> want
> >>>> it to succeed and hope to find more time for it eventually.
> >>>>
> >>>> Right now I see new/active issues for the addition of Redis and Kafka
> >>>> sinks, and while they're nice features, I'm personally concerned about
> >>>> feature bloat of the project. There are dozens of interceptors, sinks
> >> and
> >>>> sources that can be thought of, but most of them are very specific to
> a
> >>>> specific use-case.
> >>>>
> >>>> Every time we add a new component we're also committing to maintaining
> >> it
> >>>> over future releases, even if the original contributor gets too busy
> >> for
> >>>> it. The more such components get added, the more we will get
> distracted
> >>>> from improving core features and getting rid of issues affecting them.
> >>>>
> >>>> For these reasons I generally haven't submitted components we
> developed
> >>>> for internal use(because they are too specific to our use cases), just
> >>>> passing back fixes that fix bugs or apply to the core project.
> >>>>
> >>>> For these reasons I think we may want to consider either a) being more
> >>>> selective regarding additional component submissions or b) make a
> >> contrib
> >>>> directory to the project which includes the components but doesn't
> >>>> guarrantee ongoing maintenance or compatibility.
> >>>>
> >>>> On the flip side of course, taking approaches like this may discourage
> >>> new
> >>>> contributors and could thus be considered a negative, and if many
> >> people
> >>>> feel this way they should definitely share their thoughts.
> >>>>
> >>>> I'd be curious to know what others think, and what direction they hope
> >> to
> >>>> take the project in the future.
> >>>>
> >>>
> >>
>
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Jeremy Karlson <je...@gmail.com>.

As someone who just developed a sink, let me add my two cents.

If the intention is to separate “core Flume” from second class citizens like myself ( ;-) ), a contrib module only makes sense if those contributors can manage fixes and commit to their modules themselves.  Waiting for core developers to apply changes to modules they don’t want to work on will just leave maintainers like myself annoyed at waiting and core contributors annoyed at having to do it.  I think you’d have to hand out commit abilities to several people for there to be smiles all round.

If you don’t want to or can’t do that (understandable), maybe just let everyone do their own module management on GitHub or whatever, and provide a page that links to “add on” modules.  (This is the approach Elasticsearch takes, I think.)

-- Jeremy


On Nov 28, 2013, at 11:06, Hari Shreedharan <hs...@cloudera.com> wrote:

> Juhani and others,
> 
> I agree that it does make sense to add a contrib module to flume where
> non-hadoopy stuff can go. I will start a discussion on this early next week.
> 
> Hari
> 
> On Thursday, November 28, 2013, Steve Morin wrote:
> 
>> Israel,
>> I guess my questions is why the suggestion to use the elastic search
>> model, is there something you see that's not working?
>> -Steve
>> 
>> 
>> On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <israel@aicer.org<javascript:;>>
>> wrote:
>> 
>>> I think we can take a page of out the ElasticSearch playbook.
>>> 
>>> 
>>> 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
>>> 
>>> I like the model they follow.
>>> 
>>> The Flume architecture makes it easy for plugins at any layer (source,
>>> interceptor, sink etc)
>>> 
>>> Contributors can host plugins on github and manage the documentation and
>>> maintenance of the plugin.
>>> 
>>> Others can chip it when possible to improve or maintain the plugins.
>>> 
>>> This will still allow new features to the project without necessarily
>>> meaning that Flume committers are on the hook for maintaining it.
>>> 
>>> 
>>> 
>>> *Author and Instructor for the Upcoming Book and Lecture Series*
>>> *Massive Log Data Aggregation, Processing, Searching and Visualization
>> with
>>> Open Source Software*
>>> *http://massivelogdata.com <http://massivelogdata.com>*
>>> 
>>> 
>>> On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
>>> juhani_connolly@cyberagent.co.jp <javascript:;>> wrote:
>>> 
>>>> Hey guys,
>>>> 
>>>> What I write here is all just my personal opinion and I'm writing in
>>> hopes
>>>> of starting a discussion and/or getting feedback. I know I've not been
>>> very
>>>> active on the project recently(due to other engagements) but do still
>>> want
>>>> it to succeed and hope to find more time for it eventually.
>>>> 
>>>> Right now I see new/active issues for the addition of Redis and Kafka
>>>> sinks, and while they're nice features, I'm personally concerned about
>>>> feature bloat of the project. There are dozens of interceptors, sinks
>> and
>>>> sources that can be thought of, but most of them are very specific to a
>>>> specific use-case.
>>>> 
>>>> Every time we add a new component we're also committing to maintaining
>> it
>>>> over future releases, even if the original contributor gets too busy
>> for
>>>> it. The more such components get added, the more we will get distracted
>>>> from improving core features and getting rid of issues affecting them.
>>>> 
>>>> For these reasons I generally haven't submitted components we developed
>>>> for internal use(because they are too specific to our use cases), just
>>>> passing back fixes that fix bugs or apply to the core project.
>>>> 
>>>> For these reasons I think we may want to consider either a) being more
>>>> selective regarding additional component submissions or b) make a
>> contrib
>>>> directory to the project which includes the components but doesn't
>>>> guarrantee ongoing maintenance or compatibility.
>>>> 
>>>> On the flip side of course, taking approaches like this may discourage
>>> new
>>>> contributors and could thus be considered a negative, and if many
>> people
>>>> feel this way they should definitely share their thoughts.
>>>> 
>>>> I'd be curious to know what others think, and what direction they hope
>> to
>>>> take the project in the future.
>>>> 
>>> 
>>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Hari Shreedharan <hs...@cloudera.com>.

Juhani and others,

I agree that it does make sense to add a contrib module to flume where
non-hadoopy stuff can go. I will start a discussion on this early next week.

Hari

On Thursday, November 28, 2013, Steve Morin wrote:

> Israel,
>  I guess my questions is why the suggestion to use the elastic search
> model, is there something you see that's not working?
> -Steve
>
>
> On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <israel@aicer.org<javascript:;>>
> wrote:
>
> > I think we can take a page of out the ElasticSearch playbook.
> >
> >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
> >
> > I like the model they follow.
> >
> > The Flume architecture makes it easy for plugins at any layer (source,
> > interceptor, sink etc)
> >
> > Contributors can host plugins on github and manage the documentation and
> > maintenance of the plugin.
> >
> > Others can chip it when possible to improve or maintain the plugins.
> >
> > This will still allow new features to the project without necessarily
> > meaning that Flume committers are on the hook for maintaining it.
> >
> >
> >
> > *Author and Instructor for the Upcoming Book and Lecture Series*
> > *Massive Log Data Aggregation, Processing, Searching and Visualization
> with
> > Open Source Software*
> > *http://massivelogdata.com <http://massivelogdata.com>*
> >
> >
> > On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
> > juhani_connolly@cyberagent.co.jp <javascript:;>> wrote:
> >
> > > Hey guys,
> > >
> > > What I write here is all just my personal opinion and I'm writing in
> > hopes
> > > of starting a discussion and/or getting feedback. I know I've not been
> > very
> > > active on the project recently(due to other engagements) but do still
> > want
> > > it to succeed and hope to find more time for it eventually.
> > >
> > > Right now I see new/active issues for the addition of Redis and Kafka
> > > sinks, and while they're nice features, I'm personally concerned about
> > > feature bloat of the project. There are dozens of interceptors, sinks
> and
> > > sources that can be thought of, but most of them are very specific to a
> > > specific use-case.
> > >
> > > Every time we add a new component we're also committing to maintaining
> it
> > > over future releases, even if the original contributor gets too busy
> for
> > > it. The more such components get added, the more we will get distracted
> > > from improving core features and getting rid of issues affecting them.
> > >
> > > For these reasons I generally haven't submitted components we developed
> > > for internal use(because they are too specific to our use cases), just
> > > passing back fixes that fix bugs or apply to the core project.
> > >
> > > For these reasons I think we may want to consider either a) being more
> > > selective regarding additional component submissions or b) make a
> contrib
> > > directory to the project which includes the components but doesn't
> > > guarrantee ongoing maintenance or compatibility.
> > >
> > > On the flip side of course, taking approaches like this may discourage
> > new
> > > contributors and could thus be considered a negative, and if many
> people
> > > feel this way they should definitely share their thoughts.
> > >
> > > I'd be curious to know what others think, and what direction they hope
> to
> > > take the project in the future.
> > >
> >
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Steve Morin <st...@stevemorin.com>.

Israel,
 I guess my questions is why the suggestion to use the elastic search
model, is there something you see that's not working?
-Steve


On Mon, Nov 25, 2013 at 5:34 PM, Israel Ekpo <is...@aicer.org> wrote:

> I think we can take a page of out the ElasticSearch playbook.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
>
> I like the model they follow.
>
> The Flume architecture makes it easy for plugins at any layer (source,
> interceptor, sink etc)
>
> Contributors can host plugins on github and manage the documentation and
> maintenance of the plugin.
>
> Others can chip it when possible to improve or maintain the plugins.
>
> This will still allow new features to the project without necessarily
> meaning that Flume committers are on the hook for maintaining it.
>
>
>
> *Author and Instructor for the Upcoming Book and Lecture Series*
> *Massive Log Data Aggregation, Processing, Searching and Visualization with
> Open Source Software*
> *http://massivelogdata.com <http://massivelogdata.com>*
>
>
> On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
> juhani_connolly@cyberagent.co.jp> wrote:
>
> > Hey guys,
> >
> > What I write here is all just my personal opinion and I'm writing in
> hopes
> > of starting a discussion and/or getting feedback. I know I've not been
> very
> > active on the project recently(due to other engagements) but do still
> want
> > it to succeed and hope to find more time for it eventually.
> >
> > Right now I see new/active issues for the addition of Redis and Kafka
> > sinks, and while they're nice features, I'm personally concerned about
> > feature bloat of the project. There are dozens of interceptors, sinks and
> > sources that can be thought of, but most of them are very specific to a
> > specific use-case.
> >
> > Every time we add a new component we're also committing to maintaining it
> > over future releases, even if the original contributor gets too busy for
> > it. The more such components get added, the more we will get distracted
> > from improving core features and getting rid of issues affecting them.
> >
> > For these reasons I generally haven't submitted components we developed
> > for internal use(because they are too specific to our use cases), just
> > passing back fixes that fix bugs or apply to the core project.
> >
> > For these reasons I think we may want to consider either a) being more
> > selective regarding additional component submissions or b) make a contrib
> > directory to the project which includes the components but doesn't
> > guarrantee ongoing maintenance or compatibility.
> >
> > On the flip side of course, taking approaches like this may discourage
> new
> > contributors and could thus be considered a negative, and if many people
> > feel this way they should definitely share their thoughts.
> >
> > I'd be curious to know what others think, and what direction they hope to
> > take the project in the future.
> >
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Israel Ekpo <is...@aicer.org>.

I think we can take a page of out the ElasticSearch playbook.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html

I like the model they follow.

The Flume architecture makes it easy for plugins at any layer (source,
interceptor, sink etc)

Contributors can host plugins on github and manage the documentation and
maintenance of the plugin.

Others can chip it when possible to improve or maintain the plugins.

This will still allow new features to the project without necessarily
meaning that Flume committers are on the hook for maintaining it.



*Author and Instructor for the Upcoming Book and Lecture Series*
*Massive Log Data Aggregation, Processing, Searching and Visualization with
Open Source Software*
*http://massivelogdata.com <http://massivelogdata.com>*


On Mon, Nov 25, 2013 at 3:14 AM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> Hey guys,
>
> What I write here is all just my personal opinion and I'm writing in hopes
> of starting a discussion and/or getting feedback. I know I've not been very
> active on the project recently(due to other engagements) but do still want
> it to succeed and hope to find more time for it eventually.
>
> Right now I see new/active issues for the addition of Redis and Kafka
> sinks, and while they're nice features, I'm personally concerned about
> feature bloat of the project. There are dozens of interceptors, sinks and
> sources that can be thought of, but most of them are very specific to a
> specific use-case.
>
> Every time we add a new component we're also committing to maintaining it
> over future releases, even if the original contributor gets too busy for
> it. The more such components get added, the more we will get distracted
> from improving core features and getting rid of issues affecting them.
>
> For these reasons I generally haven't submitted components we developed
> for internal use(because they are too specific to our use cases), just
> passing back fixes that fix bugs or apply to the core project.
>
> For these reasons I think we may want to consider either a) being more
> selective regarding additional component submissions or b) make a contrib
> directory to the project which includes the components but doesn't
> guarrantee ongoing maintenance or compatibility.
>
> On the flip side of course, taking approaches like this may discourage new
> contributors and could thus be considered a negative, and if many people
> feel this way they should definitely share their thoughts.
>
> I'd be curious to know what others think, and what direction they hope to
> take the project in the future.
>

Re: Regarding the adding of additional sinks/sources for various DB's

Posted by Bruno Mahé <bm...@apache.org>.

On 11/25/2013 12:14 AM, Juhani Connolly wrote:
> Hey guys,
>
> What I write here is all just my personal opinion and I'm writing in
> hopes of starting a discussion and/or getting feedback. I know I've not
> been very active on the project recently(due to other engagements) but
> do still want it to succeed and hope to find more time for it eventually.
>
> Right now I see new/active issues for the addition of Redis and Kafka
> sinks, and while they're nice features, I'm personally concerned about
> feature bloat of the project. There are dozens of interceptors, sinks
> and sources that can be thought of, but most of them are very specific
> to a specific use-case.
>
> Every time we add a new component we're also committing to maintaining
> it over future releases, even if the original contributor gets too busy
> for it. The more such components get added, the more we will get
> distracted from improving core features and getting rid of issues
> affecting them.
>
> For these reasons I generally haven't submitted components we developed
> for internal use(because they are too specific to our use cases), just
> passing back fixes that fix bugs or apply to the core project.
>
> For these reasons I think we may want to consider either a) being more
> selective regarding additional component submissions or b) make a
> contrib directory to the project which includes the components but
> doesn't guarrantee ongoing maintenance or compatibility.
>
> On the flip side of course, taking approaches like this may discourage
> new contributors and could thus be considered a negative, and if many
> people feel this way they should definitely share their thoughts.
>
> I'd be curious to know what others think, and what direction they hope
> to take the project in the future.



Hi,

Is there anything I can do to help drive the discussion toward a 
conclusion, one way or the other?


Thanks,
Bruno