You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by Juhani Connolly <ju...@cyberagent.co.jp> on 2012/03/05 06:52:26 UTC

Roadmap for v1.1.0

In the "poor code reviews" discussion, Mike Percy suggested opening up a 
thread regarding the roadmap for 1.1.0 and beyond, so here's a go at 
kicking that off.

I think a the following questions present themselves, along with my 
opinions:

- When do we hope to make the next solid release? Do we have a planned 
schedule(that I may be unaware of?)
Personally I am not too attached to deciding a date in advance and would 
prefer to decide a fixed set of issues that we prioritize to fix, then 
limit the branch to bug fixes only(moving any further dev to a separate 
branch), and push that out as the next release when sufficient testing 
has been made with harmful bugs removed.

- What belongs in 1.1.0?
I for one think that for any log delivery infrastructure the core parts 
for delivery mechanisms and error recovery mechanisms should be of 
primary importance, and this is what I've been trying to work on. I do 
not feel that any further sources or sinks are necessary, but feel that 
for delivery mechanisms, the lack of a FileChannel is pretty painful. I 
also feel that a buffering mechanism(as in scribed), allowing to store 
channel overflow in a long-term medium should be a priority.
I am unsure of configuration overhauls. We have one configuration method 
that works. Should a centralized one be an immediate target or one for 
1.1.0. Should refactoring the  configuration be a priority(it was 
pointed out that FlumeConfiguration has become a god class)?
There are a few other leftovers from flume-728: metric collection 
infrastructure, documentation, master. Should these be targets for 1.1.0 
or for further down the road?
We should probably also make clear which components need to be thread 
safe and which don't. We should also verify this is the case.

Re: Roadmap for v1.1.0

Posted by Mike Percy <mp...@cloudera.com>.

Hi Juhani,

> I'm not very familiar regarding most of the Avro stuff so I'll have to take your word on this ;)
It looks like there are some questions around interop but I think the Netty implementation gives us a lot of room to grow in terms of performance. If I can find some extra time I may look into it more.

> I posted a ticket for Sinks at FLUME-1019. Feel free to check it out, and post one for sources too if you like.

Great! I responded to that on the review board. I'll add a patch for some of the other components pretty soon.

Best,
Mike

Re: Roadmap for v1.1.0

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

On 03/05/2012 04:29 PM, Mike Percy wrote:
> On Mar 4, 2012, at 9:52 PM, Juhani Connolly wrote:
>
>> In the "poor code reviews" discussion, Mike Percy suggested opening up a thread regarding the roadmap for 1.1.0 and beyond, so here's a go at kicking that off.
>>
>> I think a the following questions present themselves, along with my opinions:
>>
>> - When do we hope to make the next solid release? Do we have a planned schedule(that I may be unaware of?)
>> Personally I am not too attached to deciding a date in advance and would prefer to decide a fixed set of issues that we prioritize to fix, then limit the branch to bug fixes only(moving any further dev to a separate branch), and push that out as the next release when sufficient testing has been made with harmful bugs removed.
> I'd be inclined to try to release as often as we think we have useful features and bug fixes implemented, to maintain a rhythm and keep the vitality of the project high. I think releasing often also helps encourage users to engage with the developer community and try out and vet experimental features.
That seems fine to me, but if we're to do that, we must stay on top of 
half-complete stuff. If we're to commit to a release often type of 
schedule we need to make sure that as we refine expected behaviors that 
we also fix anything that does not adhere to them(unfortunately I'm also 
guilty of this, waiting to see what happens with brocks patch that might 
be introducing a Fatal exception to Sinks in addition to 
EventDeliveryException till I do something about it though).

>> - What belongs in 1.1.0?
>> I for one think that for any log delivery infrastructure the core parts for delivery mechanisms and error recovery mechanisms should be of primary importance, and this is what I've been trying to work on. I do not feel that any further sources or sinks are necessary, but feel that for delivery mechanisms, the lack of a FileChannel is pretty painful. I also feel that a buffering mechanism(as in scribed), allowing to store channel overflow in a long-term medium should be a priority.
> I tend agree with what you're saying, although I don't really have an aversion to integrating more Sinks as long as they have maintainers. I agree that a long term buffering solution is very important, I think that would be part of FileChannel though. Overall I think we should strive for correctness in the core, medium term API stability, and system speed, in that order for the next release. The primary thing I am looking at right now is the RPC mechanism, to ensure we are set up to take full advantage of Avro RPC performance features and ensure that remote clients can integrate with Flume in the future. I have some concerns there and I'll start a thread about it tomorrow probably, since if there are reasons to break wire compatibility we should do it as early as possible in the life of 1.x. (incidentally I also think we should start calling it 1.x instead of NG to avoid coining terms like Flume ONG and Flume NNG for 2.x :)
I'm not very familiar regarding most of the Avro stuff so I'll have to 
take your word on this ;)
>
> Along the vein of system interfaces, one big thing that I think is missing in Flume is Javadoc of all the core interfaces and classes. This is something I am certainly willing to work on. Mainly I believe that the various interface contracts need to be strongly specified in the base class Javadoc so that it's easier to tell if something is wrong and to ensure consistency across implementations. For example, if there is an error delivering an event should a Sink return BACKOFF or throw an EventDeliveryException? I'm not sure why one is a return value and the other is an exception, but we should make sure consequences and best practices are documented, and any Sinks in the core should be consistent. I'm still getting my head around the system and using the source (, Luke) to figure these things out. But hopefully future devs and API users won't have to do that as much.
I've felt this for a while too. Rather than just sit around on it I 
posted a ticket for Sinks at FLUME-1019. Feel free to check it out, and 
post one for sources too if you like.

>
> One more thing that I think is important, while not really related to a software release per se, is coming up with stories around how common use cases are supposed to work or eventually be possible. Something I've been thinking about a lot is Apache web server log collection onto HDFS. While tail source is known to be problematic (deserves a FAQ entry), we should provide explanations and best practices for the most common cases. (In this case I think it involves writing an apache httpd mod_flume module that speaks Avro). We can then eventually provide code for these most common cases when we have time to implement them or as they are contributed. These very common use cases and the stories around them should inform our design decisions.
I mentioned in the cancelled tail-source issue that we could always 
produce a tail client that sends avro messages. It could be written in a 
language that can use inodes(though this would be platform specific)
>> I am unsure of configuration overhauls. We have one configuration method that works. Should a centralized one be an immediate target or one for 1.1.0. Should refactoring the  configuration be a priority(it was pointed out that FlumeConfiguration has become a god class)?
>> There are a few other leftovers from flume-728: metric collection infrastructure, documentation, master. Should these be targets for 1.1.0 or for further down the road?
>> We should probably also make clear which components need to be thread safe and which don't. We should also verify this is the case.
> OK so my understanding is that some changes to how we do config validation are required to be able to write a tool to validate Flume configs without having to start an agent. The idea is for this functionality to be separated from the core to some extent so that the validation mechanism can be exposed as an API. The initial request for an API came from the Cloudera enterprise team, who wants to add Flume configuration validation support in the Cloudera Manager app. Personally I think it would be a great feature to have in a command line tool as well. From an operations perspective, it's nice to have the ability to check that your config is valid before pushing it, instead of finding out your config is broken once you deploy to all your agents… especially if you are in an emergency production situation and you need to make changes fast. If you have concerns about the implementation beyond the issues that Eric raised, or even if you agree/disagree with the current feedback on the review, then I know Hari would appreciate any constructive feedback that you or other folks can provide. Of course if folks think that it's an undesirable feature, have concerns, or think there is a better way to design it then they should definitely speak up in the JIRA, the review tool, or here as well.
I haven't specifically looked at the entirety of Hari's change, though I 
intend to. I was more referring to the current state of 
FlumeConfiguration which as Hari mentioned, is already kind of a god 
class. I certainly think the intention of Hari's changes are a good idea.
When I referred to a configuration overhaul I was also thinking of 
breaking up that god class, and getting all the abstract logic out of 
PropertiesFileConfigurationProvider where it probably shouldn't be.

> Anyway, I think other folks should chime in on this thread and we should ultimately morph this discussion into a list of JIRAs for inclusion into a 1.1.0. And I would advocate that the rest would move to 1.2.0 by default.
If we could do that, that would be great.
> Regards,
> Mike
>
>

Re: Roadmap for v1.1.0

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

On 03/05/2012 04:44 PM, Mike Percy wrote:
>
>>> There are a few other leftovers from flume-728: metric collection infrastructure, documentation, master. Should these be targets for 1.1.0 or for further down the road?
>>> We should probably also make clear which components need to be thread safe and which don't. We should also verify this is the case.
> What do you mean by Master?
Centralized configuration w. zookeeper, possibly agent failover
>
> +1 on documenting thread safety and providing much more documentation in general.
>
> I'm not sure about exposing metrics for 1.1.0… while it's important for folks running Flume and we should make it a high priority, I think we could probably provide enough value with more important stuff to justify a next release without it, if we are releasing frequently. Then again if someone wanted to work on JMX support or something like that I wouldn't be against it!
>
> Regards,
> Mike
>
>
I don't see them as an immediate concern either, we'll have to see what 
others think.

Re: Roadmap for v1.1.0

Posted by Mike Percy <mp...@cloudera.com>.

Sorry, I missed a couple things at the end (inline)

On Mar 4, 2012, at 11:29 PM, Mike Percy wrote:

> On Mar 4, 2012, at 9:52 PM, Juhani Connolly wrote:
> 
>> In the "poor code reviews" discussion, Mike Percy suggested opening up a thread regarding the roadmap for 1.1.0 and beyond, so here's a go at kicking that off.
>> 
>> I think a the following questions present themselves, along with my opinions:
>> 
>> - When do we hope to make the next solid release? Do we have a planned schedule(that I may be unaware of?)
>> Personally I am not too attached to deciding a date in advance and would prefer to decide a fixed set of issues that we prioritize to fix, then limit the branch to bug fixes only(moving any further dev to a separate branch), and push that out as the next release when sufficient testing has been made with harmful bugs removed.
> 
> I'd be inclined to try to release as often as we think we have useful features and bug fixes implemented, to maintain a rhythm and keep the vitality of the project high. I think releasing often also helps encourage users to engage with the developer community and try out and vet experimental features.
> 
>> - What belongs in 1.1.0?
>> I for one think that for any log delivery infrastructure the core parts for delivery mechanisms and error recovery mechanisms should be of primary importance, and this is what I've been trying to work on. I do not feel that any further sources or sinks are necessary, but feel that for delivery mechanisms, the lack of a FileChannel is pretty painful. I also feel that a buffering mechanism(as in scribed), allowing to store channel overflow in a long-term medium should be a priority.
> 
> I tend agree with what you're saying, although I don't really have an aversion to integrating more Sinks as long as they have maintainers. I agree that a long term buffering solution is very important, I think that would be part of FileChannel though. Overall I think we should strive for correctness in the core, medium term API stability, and system speed, in that order for the next release. The primary thing I am looking at right now is the RPC mechanism, to ensure we are set up to take full advantage of Avro RPC performance features and ensure that remote clients can integrate with Flume in the future. I have some concerns there and I'll start a thread about it tomorrow probably, since if there are reasons to break wire compatibility we should do it as early as possible in the life of 1.x. (incidentally I also think we should start calling it 1.x instead of NG to avoid coining terms like Flume ONG and Flume NNG for 2.x :)
> 
> Along the vein of system interfaces, one big thing that I think is missing in Flume is Javadoc of all the core interfaces and classes. This is something I am certainly willing to work on. Mainly I believe that the various interface contracts need to be strongly specified in the base class Javadoc so that it's easier to tell if something is wrong and to ensure consistency across implementations. For example, if there is an error delivering an event should a Sink return BACKOFF or throw an EventDeliveryException? I'm not sure why one is a return value and the other is an exception, but we should make sure consequences and best practices are documented, and any Sinks in the core should be consistent. I'm still getting my head around the system and using the source (, Luke) to figure these things out. But hopefully future devs and API users won't have to do that as much.
> 
> One more thing that I think is important, while not really related to a software release per se, is coming up with stories around how common use cases are supposed to work or eventually be possible. Something I've been thinking about a lot is Apache web server log collection onto HDFS. While tail source is known to be problematic (deserves a FAQ entry), we should provide explanations and best practices for the most common cases. (In this case I think it involves writing an apache httpd mod_flume module that speaks Avro). We can then eventually provide code for these most common cases when we have time to implement them or as they are contributed. These very common use cases and the stories around them should inform our design decisions.
> 
>> I am unsure of configuration overhauls. We have one configuration method that works. Should a centralized one be an immediate target or one for 1.1.0. Should refactoring the  configuration be a priority(it was pointed out that FlumeConfiguration has become a god class)?
> 
> OK so my understanding is that some changes to how we do config validation are required to be able to write a tool to validate Flume configs without having to start an agent. The idea is for this functionality to be separated from the core to some extent so that the validation mechanism can be exposed as an API. The initial request for an API came from the Cloudera enterprise team, who wants to add Flume configuration validation support in the Cloudera Manager app. Personally I think it would be a great feature to have in a command line tool as well. From an operations perspective, it's nice to have the ability to check that your config is valid before pushing it, instead of finding out your config is broken once you deploy to all your agents… especially if you are in an emergency production situation and you need to make changes fast. If you have concerns about the implementation beyond the issues that Eric raised, or even if you agree/disagree with the current feedback on the review, then I know Hari would appreciate any constructive feedback that you or other folks can provide. Of course if folks think that it's an undesirable feature, have concerns, or think there is a better way to design it then they should definitely speak up in the JIRA, the review tool, or here as well.
> 
> Anyway, I think other folks should chime in on this thread and we should ultimately morph this discussion into a list of JIRAs for inclusion into a 1.1.0. And I would advocate that the rest would move to 1.2.0 by default.

> 
>> There are a few other leftovers from flume-728: metric collection infrastructure, documentation, master. Should these be targets for 1.1.0 or for further down the road?
>> We should probably also make clear which components need to be thread safe and which don't. We should also verify this is the case.

What do you mean by Master?

+1 on documenting thread safety and providing much more documentation in general.

I'm not sure about exposing metrics for 1.1.0… while it's important for folks running Flume and we should make it a high priority, I think we could probably provide enough value with more important stuff to justify a next release without it, if we are releasing frequently. Then again if someone wanted to work on JMX support or something like that I wouldn't be against it!

Regards,
Mike

Re: Roadmap for v1.1.0

Posted by Mike Percy <mp...@cloudera.com>.

On Mar 4, 2012, at 9:52 PM, Juhani Connolly wrote:

> In the "poor code reviews" discussion, Mike Percy suggested opening up a thread regarding the roadmap for 1.1.0 and beyond, so here's a go at kicking that off.
> 
> I think a the following questions present themselves, along with my opinions:
> 
> - When do we hope to make the next solid release? Do we have a planned schedule(that I may be unaware of?)
> Personally I am not too attached to deciding a date in advance and would prefer to decide a fixed set of issues that we prioritize to fix, then limit the branch to bug fixes only(moving any further dev to a separate branch), and push that out as the next release when sufficient testing has been made with harmful bugs removed.

I'd be inclined to try to release as often as we think we have useful features and bug fixes implemented, to maintain a rhythm and keep the vitality of the project high. I think releasing often also helps encourage users to engage with the developer community and try out and vet experimental features.

> - What belongs in 1.1.0?
> I for one think that for any log delivery infrastructure the core parts for delivery mechanisms and error recovery mechanisms should be of primary importance, and this is what I've been trying to work on. I do not feel that any further sources or sinks are necessary, but feel that for delivery mechanisms, the lack of a FileChannel is pretty painful. I also feel that a buffering mechanism(as in scribed), allowing to store channel overflow in a long-term medium should be a priority.

I tend agree with what you're saying, although I don't really have an aversion to integrating more Sinks as long as they have maintainers. I agree that a long term buffering solution is very important, I think that would be part of FileChannel though. Overall I think we should strive for correctness in the core, medium term API stability, and system speed, in that order for the next release. The primary thing I am looking at right now is the RPC mechanism, to ensure we are set up to take full advantage of Avro RPC performance features and ensure that remote clients can integrate with Flume in the future. I have some concerns there and I'll start a thread about it tomorrow probably, since if there are reasons to break wire compatibility we should do it as early as possible in the life of 1.x. (incidentally I also think we should start calling it 1.x instead of NG to avoid coining terms like Flume ONG and Flume NNG for 2.x :)

Along the vein of system interfaces, one big thing that I think is missing in Flume is Javadoc of all the core interfaces and classes. This is something I am certainly willing to work on. Mainly I believe that the various interface contracts need to be strongly specified in the base class Javadoc so that it's easier to tell if something is wrong and to ensure consistency across implementations. For example, if there is an error delivering an event should a Sink return BACKOFF or throw an EventDeliveryException? I'm not sure why one is a return value and the other is an exception, but we should make sure consequences and best practices are documented, and any Sinks in the core should be consistent. I'm still getting my head around the system and using the source (, Luke) to figure these things out. But hopefully future devs and API users won't have to do that as much.

One more thing that I think is important, while not really related to a software release per se, is coming up with stories around how common use cases are supposed to work or eventually be possible. Something I've been thinking about a lot is Apache web server log collection onto HDFS. While tail source is known to be problematic (deserves a FAQ entry), we should provide explanations and best practices for the most common cases. (In this case I think it involves writing an apache httpd mod_flume module that speaks Avro). We can then eventually provide code for these most common cases when we have time to implement them or as they are contributed. These very common use cases and the stories around them should inform our design decisions.

> I am unsure of configuration overhauls. We have one configuration method that works. Should a centralized one be an immediate target or one for 1.1.0. Should refactoring the  configuration be a priority(it was pointed out that FlumeConfiguration has become a god class)?
> There are a few other leftovers from flume-728: metric collection infrastructure, documentation, master. Should these be targets for 1.1.0 or for further down the road?
> We should probably also make clear which components need to be thread safe and which don't. We should also verify this is the case.

OK so my understanding is that some changes to how we do config validation are required to be able to write a tool to validate Flume configs without having to start an agent. The idea is for this functionality to be separated from the core to some extent so that the validation mechanism can be exposed as an API. The initial request for an API came from the Cloudera enterprise team, who wants to add Flume configuration validation support in the Cloudera Manager app. Personally I think it would be a great feature to have in a command line tool as well. From an operations perspective, it's nice to have the ability to check that your config is valid before pushing it, instead of finding out your config is broken once you deploy to all your agents… especially if you are in an emergency production situation and you need to make changes fast. If you have concerns about the implementation beyond the issues that Eric raised, or even if you agree/disagree with the current feedback on the review, then I know Hari would appreciate any constructive feedback that you or other folks can provide. Of course if folks think that it's an undesirable feature, have concerns, or think there is a better way to design it then they should definitely speak up in the JIRA, the review tool, or here as well.

Anyway, I think other folks should chime in on this thread and we should ultimately morph this discussion into a list of JIRAs for inclusion into a 1.1.0. And I would advocate that the rest would move to 1.2.0 by default.

Regards,
Mike