You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Mingjie Lai <mj...@gmail.com> on 2011/11/03 21:38:40 UTC

Re: RFC - Avoid Master restart when deploying custom plugins for ad-hoc queries/transformations

Frank.

 > If stub_collector gets reconfigured, would that force nodeA to restart?

No.

 > Would it interrupt the flow of events to main_collector?

It could be, see below.

 > What happens to a Fanout Sink if one of the downstream nodes briefly
 > becomes unavailable?

Ideally, a failure of one sink of a set of fanout shouldn't impact 
others. But currently the implementation of fanout is just a 
synchronous, single threaded, which means one failed sink may cause 
other sinks to block.

It's definitely can be improved to eliminate the dependency of all the 
sinks, by giving a thread to each of the fanout sinks. Do you want to 
file a jira for it?

Regarding Kafka, my naive understanding is based on a conference talk 
given by linkedin folks. I don't really know the status of the project, 
but the concept of it.

Thanks,
Mingjie

On 10/28/2011 07:52 AM, Frank Grimes wrote:
> Hi Mingjie,
>
> I'll definitely take a closer look at Kafka.
>
> However, could the node restart and event flow interruption not be
> avoided by adding the following indirection?
>
> exec config main_collector
> exec config stub_collector 'logicalSource' 'null'
>
> exec config nodeA 'tailSource("/var/log/apache.out")'
> '[logicalSink("main_collector"), logicalSink("stub_collector")]'
>
> If stub_collector gets reconfigured, would that force nodeA to restart?
> Would it interrupt the flow of events to main_collector?
> What happens to a Fanout Sink if one of the downstream nodes briefly
> becomes unavailable?
>
> Thanks,
>
> Frank Grimes
>
>
> ------------------------------------------------------------------------
> *From:* Mingjie Lai <mj...@gmail.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Wednesday, October 26, 2011 4:40:40 PM
> *Subject:* Re: RFC - Avoid Master restart when deploying custom plugins
> for ad-hoc queries/transformations
>
> Frank.
>
> I think you're talking about 2 things:
> 1) create a new flow from flume for new collectors
> 2) loading new plugins without restarting flume nodes + master.
>
> For 1),
>  > a requirement to allow our developers to dynamically
>  > subscribe/listen to events passing through
>
> IMHO, flume is designed for moving data (a lot of data) from one place
> to another. It's not really a subscription based message queue system
> for a lot of dynamic listeners -- such as Kafka. Adding fanouts can be
> used to solve the issue, but it needs a reconfiguration for a new
> collector -- which means restarting the existing node, and the existing
> collectors will be impacted due to the restart.
>
> It really depends on how frequently you want to do the
> subscription/redirection.
>
> For 2): right now flume doesn't support loading new plugins w/o
> restarting master/nodes. But it wouldn't be too difficult to add the
> feature. hbase coprocessors is a similar feature, where hbase users can
> add/remove a new coprocessor(plugin) from shell. The new coprocessor can
> be loaded from a jar file located at hdfs or local fs. It's a reasonable
> requirement for flume. But rarely ppl asked for it. :)
>
>  > i.e. calling flume node -n node-name -c 'node-name: rpcSrc(55555) |
>  > customDecorator1…customDecoratorN sink;'
>
> It could work. But I'd say it may be hard for ops to maintain the whole
> system.
>
> Thanks.
> Mingjie
>
>
> On 10/21/2011 02:14 PM, Frank Grimes wrote:
>  > Hi All,
>  >
>  > We have a requirement to allow our developers to dynamically
>  > subscribe/listen to events passing through Flume so that they could
>  > query/transform/redirect some of the events.
>  >
>  > We imagined that it would simply be a matter of spinning up a new
>  > Collector Node process with the custom plugins and then modifying the
>  > flow to fan out to that new Collector Node so that the events could be
>  > observed.
>  >
>  > However, we see that currently the Master needs plugins in its classpath
>  > as well so that it can validate configs before pushing them its managed
>  > Nodes. (https://cwiki.apache.org/FLUME/troubleshooting-faq.html)
>  >
>  > We also see that there is a feature request for hot-deployable plugins,
>  > but it looks like it's pretty far off.
>  > (https://issues.apache.org/jira/browse/FLUME-147)
>  >
>  > We've done a little experimenting and believe that we may have come up
>  > with an interim solution and wanted to run it by this list for some peer
>  > review.
>  >
>  > Essentially, spinning up a new orphaned (no available Master) physical
>  > Collector node (with the custom plugins in its classpath) and with an
>  > initial configuration seems to do the trick.
>  > i.e. calling flume node -n node-name -c 'node-name: rpcSrc(55555) |
>  > customDecorator1…customDecoratorN sink;'
>  >
>  > Thoughts? Potential problems with this approach? Better ways to do this?
>  >
>  > Thanks,
>  >
>  > Frank Grimes
>
>