You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Mingjie Lai <mj...@gmail.com> on 2011/11/03 21:38:40 UTC
Re: RFC - Avoid Master restart when deploying custom plugins for
ad-hoc queries/transformations
Frank.
> If stub_collector gets reconfigured, would that force nodeA to restart?
No.
> Would it interrupt the flow of events to main_collector?
It could be, see below.
> What happens to a Fanout Sink if one of the downstream nodes briefly
> becomes unavailable?
Ideally, a failure of one sink of a set of fanout shouldn't impact
others. But currently the implementation of fanout is just a
synchronous, single threaded, which means one failed sink may cause
other sinks to block.
It's definitely can be improved to eliminate the dependency of all the
sinks, by giving a thread to each of the fanout sinks. Do you want to
file a jira for it?
Regarding Kafka, my naive understanding is based on a conference talk
given by linkedin folks. I don't really know the status of the project,
but the concept of it.
Thanks,
Mingjie
On 10/28/2011 07:52 AM, Frank Grimes wrote:
> Hi Mingjie,
>
> I'll definitely take a closer look at Kafka.
>
> However, could the node restart and event flow interruption not be
> avoided by adding the following indirection?
>
> exec config main_collector
> exec config stub_collector 'logicalSource' 'null'
>
> exec config nodeA 'tailSource("/var/log/apache.out")'
> '[logicalSink("main_collector"), logicalSink("stub_collector")]'
>
> If stub_collector gets reconfigured, would that force nodeA to restart?
> Would it interrupt the flow of events to main_collector?
> What happens to a Fanout Sink if one of the downstream nodes briefly
> becomes unavailable?
>
> Thanks,
>
> Frank Grimes
>
>
> ------------------------------------------------------------------------
> *From:* Mingjie Lai <mj...@gmail.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Wednesday, October 26, 2011 4:40:40 PM
> *Subject:* Re: RFC - Avoid Master restart when deploying custom plugins
> for ad-hoc queries/transformations
>
> Frank.
>
> I think you're talking about 2 things:
> 1) create a new flow from flume for new collectors
> 2) loading new plugins without restarting flume nodes + master.
>
> For 1),
> > a requirement to allow our developers to dynamically
> > subscribe/listen to events passing through
>
> IMHO, flume is designed for moving data (a lot of data) from one place
> to another. It's not really a subscription based message queue system
> for a lot of dynamic listeners -- such as Kafka. Adding fanouts can be
> used to solve the issue, but it needs a reconfiguration for a new
> collector -- which means restarting the existing node, and the existing
> collectors will be impacted due to the restart.
>
> It really depends on how frequently you want to do the
> subscription/redirection.
>
> For 2): right now flume doesn't support loading new plugins w/o
> restarting master/nodes. But it wouldn't be too difficult to add the
> feature. hbase coprocessors is a similar feature, where hbase users can
> add/remove a new coprocessor(plugin) from shell. The new coprocessor can
> be loaded from a jar file located at hdfs or local fs. It's a reasonable
> requirement for flume. But rarely ppl asked for it. :)
>
> > i.e. calling flume node -n node-name -c 'node-name: rpcSrc(55555) |
> > customDecorator1…customDecoratorN sink;'
>
> It could work. But I'd say it may be hard for ops to maintain the whole
> system.
>
> Thanks.
> Mingjie
>
>
> On 10/21/2011 02:14 PM, Frank Grimes wrote:
> > Hi All,
> >
> > We have a requirement to allow our developers to dynamically
> > subscribe/listen to events passing through Flume so that they could
> > query/transform/redirect some of the events.
> >
> > We imagined that it would simply be a matter of spinning up a new
> > Collector Node process with the custom plugins and then modifying the
> > flow to fan out to that new Collector Node so that the events could be
> > observed.
> >
> > However, we see that currently the Master needs plugins in its classpath
> > as well so that it can validate configs before pushing them its managed
> > Nodes. (https://cwiki.apache.org/FLUME/troubleshooting-faq.html)
> >
> > We also see that there is a feature request for hot-deployable plugins,
> > but it looks like it's pretty far off.
> > (https://issues.apache.org/jira/browse/FLUME-147)
> >
> > We've done a little experimenting and believe that we may have come up
> > with an interim solution and wanted to run it by this list for some peer
> > review.
> >
> > Essentially, spinning up a new orphaned (no available Master) physical
> > Collector node (with the custom plugins in its classpath) and with an
> > initial configuration seems to do the trick.
> > i.e. calling flume node -n node-name -c 'node-name: rpcSrc(55555) |
> > customDecorator1…customDecoratorN sink;'
> >
> > Thoughts? Potential problems with this approach? Better ways to do this?
> >
> > Thanks,
> >
> > Frank Grimes
>
>