You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Eric Sammer <es...@cloudera.com> on 2011/08/12 03:26:53 UTC

flume-728 - a refactoring

Flumeeeees:

Flume has evolved over the last few years and has come a long way. I
think, to hit the next bar of reliability, maintainability, and
adoption, some of the core bits need some refactoring / design
retrofit. To this end, I've started a "revolutionary branch[1]." I've
listed some of my rationale as to why I think this is a good thing in
the JIRA, but I'm happy to go into detail here.

My main motivation for this comes from working on Flume and supporting
it in my day job at Cloudera. That said, I do this as an individual,
and with my ASF hat firmly in place. My (short) rational:
* I think the code base is too complex and that this is a barrier to
greater developer adoption. The internals shouldn't be scary.
* Some of the invariants of Flume have varied and remnants gum up the
works. For instance, there was a time where it was assumed there
wouldn't be multiple logical nodes per physical node; the complexities
of the threading came later.
* A few advertised features do not work as we'd expect / like. I want
to make it simpler to add these features.
* A number of recent bugs have exposed some evolutionary
implementation that could use refactoring.
* Flume does too much. It should do a smaller number of things (that
people really need / use) and do them exceedingly well. It's become
clear that some features are more important to people than others.

The details:

* The branch is at
http://svn.apache.org/viewvc/incubator/flume/branches/flume-728/
* There is already a (significantly smaller) core of Flume and a
skeletal Flume node.
* The wiki page tracking my notes and the "project" is at
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
* The parent JIRA tracking the project is at
https://issues.apache.org/jira/browse/FLUME-728

The process / intent:
* I intend to move extremely fast on the flume-728 branch and then
request a series of strict reviews and call for a vote to merge to
trunk. I'm happy to take reviews in the interim.
* I'd love folks to get involved and have this become a group effort.
The reason I started was to have a baseline to speak from and show 1.
that's I'm serious (via code) and 2. what I think an implementation
could look like.
* I fully understand the community / PPMC may -1 the merge (but that
would make me sad, so why would you do that?). I also immediately
regretted using the "NG" designation; it's presumptuous and I
apologize. Going forward, I'll refer to it as flume-728.

Excited to hear feedback or questions. Thanks.

[1] jmhsieh pointed out an email from Long Ago(tm) that described this
situation well. I'm following that approach, in spirit.
http://incubator.apache.org/learn/rules-for-revolutionaries.html
-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: flume-728 - a refactoring

Posted by Bao Thai Ngo <ba...@gmail.com>.
Hi,

We've also run into many issues and resolved them by making some changes to
the Flume code. However, the complexity of Flume hindered us from writing
some efficient custom code.

As being said, a clean core would be great.

+1.

cheers,
~Thai

On Fri, Aug 12, 2011 at 3:38 PM, Torsten Curdt <tc...@vafer.org> wrote:

> Sounds great!
>
> When we recently started using flume and we've run into many (way too
> many) issues and ended up writing quite some custom code. Unnecessary
> complexity is just one of things we found when we looked under the
> covers.
>
> I am currently trying to get the customer's permission to contribute
> back the changes we got there.
>
> Anyway I guess - all I am saying is: A fresh knew and clean core is
> appreciated. Flume doesn't feel as stable that one has to worry much
> about backwards compatibility yet.
>
> Big +1
>
> cheers,
> Torsten
>

Re: flume-728 - a refactoring

Posted by Torsten Curdt <tc...@vafer.org>.
Sounds great!

When we recently started using flume and we've run into many (way too
many) issues and ended up writing quite some custom code. Unnecessary
complexity is just one of things we found when we looked under the
covers.

I am currently trying to get the customer's permission to contribute
back the changes we got there.

Anyway I guess - all I am saying is: A fresh knew and clean core is
appreciated. Flume doesn't feel as stable that one has to worry much
about backwards compatibility yet.

Big +1

cheers,
Torsten

Re: flume-728 - a refactoring

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Strong +1.

I think this is a great philosophy and strikes a great balance.  I like how
it enable major core changes to happen in the open and to balance the needs
and requests of folks who depend on the current implementation.

Thanks,
Jon.

On Thu, Aug 11, 2011 at 6:26 PM, Eric Sammer <es...@cloudera.com> wrote:

> Flumeeeees:
>
> Flume has evolved over the last few years and has come a long way. I
> think, to hit the next bar of reliability, maintainability, and
> adoption, some of the core bits need some refactoring / design
> retrofit. To this end, I've started a "revolutionary branch[1]." I've
> listed some of my rationale as to why I think this is a good thing in
> the JIRA, but I'm happy to go into detail here.
>
> My main motivation for this comes from working on Flume and supporting
> it in my day job at Cloudera. That said, I do this as an individual,
> and with my ASF hat firmly in place. My (short) rational:
> * I think the code base is too complex and that this is a barrier to
> greater developer adoption. The internals shouldn't be scary.
> * Some of the invariants of Flume have varied and remnants gum up the
> works. For instance, there was a time where it was assumed there
> wouldn't be multiple logical nodes per physical node; the complexities
> of the threading came later.
> * A few advertised features do not work as we'd expect / like. I want
> to make it simpler to add these features.
> * A number of recent bugs have exposed some evolutionary
> implementation that could use refactoring.
> * Flume does too much. It should do a smaller number of things (that
> people really need / use) and do them exceedingly well. It's become
> clear that some features are more important to people than others.
>
> The details:
>
> * The branch is at
> http://svn.apache.org/viewvc/incubator/flume/branches/flume-728/
> * There is already a (significantly smaller) core of Flume and a
> skeletal Flume node.
> * The wiki page tracking my notes and the "project" is at
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
> * The parent JIRA tracking the project is at
> https://issues.apache.org/jira/browse/FLUME-728
>
> The process / intent:
> * I intend to move extremely fast on the flume-728 branch and then
> request a series of strict reviews and call for a vote to merge to
> trunk. I'm happy to take reviews in the interim.
> * I'd love folks to get involved and have this become a group effort.
> The reason I started was to have a baseline to speak from and show 1.
> that's I'm serious (via code) and 2. what I think an implementation
> could look like.
> * I fully understand the community / PPMC may -1 the merge (but that
> would make me sad, so why would you do that?). I also immediately
> regretted using the "NG" designation; it's presumptuous and I
> apologize. Going forward, I'll refer to it as flume-728.
>
> Excited to hear feedback or questions. Thanks.
>
> [1] jmhsieh pointed out an email from Long Ago(tm) that described this
> situation well. I'm following that approach, in spirit.
> http://incubator.apache.org/learn/rules-for-revolutionaries.html
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: flume-728 - a refactoring

Posted by Ashish <pa...@gmail.com>.
+1

Some more comments inline

On Fri, Aug 12, 2011 at 6:56 AM, Eric Sammer <es...@cloudera.com> wrote:

> Flumeeeees:
>
> Flume has evolved over the last few years and has come a long way. I
> think, to hit the next bar of reliability, maintainability, and
> adoption, some of the core bits need some refactoring / design
> retrofit. To this end, I've started a "revolutionary branch[1]." I've
> listed some of my rationale as to why I think this is a good thing in
> the JIRA, but I'm happy to go into detail here.
>
> My main motivation for this comes from working on Flume and supporting
> it in my day job at Cloudera. That said, I do this as an individual,
> and with my ASF hat firmly in place. My (short) rational:
> * I think the code base is too complex and that this is a barrier to
> greater developer adoption. The internals shouldn't be scary.
>

+1, We can also enhance Developer guide to describe which package does what.
Cassandra community did something very good. For beginner they posted some
JIRA's that would help them understanding the internals. we can do something
similar.


> * Some of the invariants of Flume have varied and remnants gum up the
> works. For instance, there was a time where it was assumed there
> wouldn't be multiple logical nodes per physical node; the complexities
> of the threading came later.
> * A few advertised features do not work as we'd expect / like. I want
> to make it simpler to add these features.
> * A number of recent bugs have exposed some evolutionary
> implementation that could use refactoring.
> * Flume does too much. It should do a smaller number of things (that
> people really need / use) and do them exceedingly well. It's become
> clear that some features are more important to people than others.
>

Will we be migrating the package names to apache as well?



>
> The details:
>
> * The branch is at
> http://svn.apache.org/viewvc/incubator/flume/branches/flume-728/
> * There is already a (significantly smaller) core of Flume and a
> skeletal Flume node.
> * The wiki page tracking my notes and the "project" is at
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
> * The parent JIRA tracking the project is at
> https://issues.apache.org/jira/browse/FLUME-728
>
> The process / intent:
> * I intend to move extremely fast on the flume-728 branch and then
> request a series of strict reviews and call for a vote to merge to
> trunk. I'm happy to take reviews in the interim.
> * I'd love folks to get involved and have this become a group effort.
> The reason I started was to have a baseline to speak from and show 1.
> that's I'm serious (via code) and 2. what I think an implementation
> could look like.
>

Would recommend that we start specific discussion threads. Flume community
is quite active  so expecting lot of great discussions :)


> * I fully understand the community / PPMC may -1 the merge (but that
> would make me sad, so why would you do that?). I also immediately
> regretted using the "NG" designation; it's presumptuous and I
> apologize. Going forward, I'll refer to it as flume-728.
>

IMHO, not a big deal. Appreciate your passion for Flume :)


>
> Excited to hear feedback or questions. Thanks.
>
> [1] jmhsieh pointed out an email from Long Ago(tm) that described this
> situation well. I'm following that approach, in spirit.
> http://incubator.apache.org/learn/rules-for-revolutionaries.html
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal