You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Luiz Geovani Vier <lg...@gmail.com> on 2014/09/08 16:50:53 UTC

Asynchronous SpillableMemoryChannel

Hello, dear Flume devs,

I'm currently experimenting with using flume's embedded agent to relay
~10MB/s worth of events to another set of flume agents.
There won't be much memory available, so the SpillableMemoryChannel looked
like a good alternative, however:

* This agent needs to receive the events in a "fire and forget" approach so
that it doesn't impact the application's performance.
When the SpillableMemoryChannel starts spilling to disk, the performance is
significantly impaired due to the synchronous I/O calls.
* If the channel becomes full, I'd like to discard older events instead of
rejecting new ones, as the current events are more important in this case
(application/usage metrics).
* When the SpillableMemoryChannel contains a lot of data on disk, it takes
several minutes to become available after a restart, preventing Flume (and
in this case the application that is embedding it as well) from accepting
events during this period.

With that in mind, I started writing a new channel which is basically a
MemoryChannel with a background thread that starts moving events into a
FileChannel before the MemoryChannel becomes full.
It also starts the FileChannel in background, so Flume can begin accepting
events into its MemoryChannel immediately.
Moreover, it can discard older events from the FileChannel if necessary to
accommodate new events spilling from the MemoryChannel.

The code can be found here:
Project: https://github.com/lgvier/flume-async-spillable-mem-channel
Class:
https://github.com/lgvier/flume-async-spillable-mem-channel/blob/master/src/main/java/org/apache/flume/channel/AsyncSpillableMemoryChannel.java

I'd appreciate your input on it very much.
Does it seem like a good approach?
Would there be a better solution for this scenario? I'm also considering
using the Flume RPC client with third-party queueing mechanisms, but would
prefer an end-to-end flume solution.
Is it useful for the Flume community?

Thank you,
-Geovani

Re: Asynchronous SpillableMemoryChannel

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Geovani,

I was re-reading your email and had the same reaction as Roshan.  I'll
comment inline.  Note how I numbered your points for easier discussion.


On Mon, Sep 8, 2014 at 6:31 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> it may be better to make this async behavior a policy of the spillable
> channel rather than introduce a new channel.
> -roshan
>
> On Mon, Sep 8, 2014 at 7:50 AM, Luiz Geovani Vier <lg...@gmail.com>
> wrote:
>
> > Hello, dear Flume devs,
> >
> > I'm currently experimenting with using flume's embedded agent to relay
> > ~10MB/s worth of events to another set of flume agents.
> > There won't be much memory available, so the SpillableMemoryChannel
> looked
> > like a good alternative, however:
> >
> > 1) This agent needs to receive the events in a "fire and forget"
> approach so
> > that it doesn't impact the application's performance.
>

Is this something that SpillableMemoryChannel cannot do today?  Please see
the next question.


> > When the SpillableMemoryChannel starts spilling to disk, the performance
> is
> > significantly impaired due to the synchronous I/O calls.
>

So the IO comes into play only if MC is full.  But if MC is not full, are
you saying that MC doesn't provide "fire and forget" functionality or that
it's simply not fast enough for your needs?


> > 2) If the channel becomes full, I'd like to discard older events instead
> of
> > rejecting new ones, as the current events are more important in this case
> > (application/usage metrics).
>

Could this be added to the existing SMC, as a new option: purge events
older than X (your impl?) or purge oldest events and keep only the last N
events or keep them all (current behaviour)?
It may be nicer to have SMC that can do that than have a separate, new
channel, with a lot of overlap.


> > 3) When the SpillableMemoryChannel contains a lot of data on disk, it
> takes
> > several minutes to become available after a restart, preventing Flume
> (and
> > in this case the application that is embedding it as well) from accepting
> > events during this period.
>

 Ouch, I wasn't aware of this!

> With that in mind, I started writing a new channel which is basically a
> > MemoryChannel with a background thread that starts moving events into a
> > FileChannel before the MemoryChannel becomes full.
>

Isn't SMC very similar in the sense that it starts writing new events to FC
when MC becomes full?
Or are you saying that your implementation moves *old* events from MC to FC
and always adds new events to MC?


> > It also starts the FileChannel in background, so Flume can begin
> accepting
> > events into its MemoryChannel immediately.
>

So is this because with SMC the old stuff all needs to read from disk, then
sent out, and only then new events can start getting added to MC again?


> > Moreover, it can discard older events from the FileChannel if necessary
> to
> > accommodate new events spilling from the MemoryChannel.
> >
> > The code can be found here:
> > Project: https://github.com/lgvier/flume-async-spillable-mem-channel
> > Class:
> >
> >
> https://github.com/lgvier/flume-async-spillable-mem-channel/blob/master/src/main/java/org/apache/flume/channel/AsyncSpillableMemoryChannel.java
> >
> > I'd appreciate your input on it very much.
> > Does it seem like a good approach?
> > Would there be a better solution for this scenario? I'm also considering
>
> using the Flume RPC client with third-party queueing mechanisms, but would
> > prefer an end-to-end flume solution.
> > Is it useful for the Flume community?
>

I can't comment about the quality of the implementation - I'm not qualified
to do that for Flume, but from your description this sounds very desirable
and I'd love to see this as part of Flume!

Is there anything that SMC can do that your ASMC cannot do?
For example, can your ASMC operate in the mode where it doesn't remove old
events?
If there is something that SMC can do that ASMC cannot, but you can add
that to ASMC, then I don't see why Flume developers would not "adopt" your
ASMC and deprecate SMC.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Asynchronous SpillableMemoryChannel

Posted by Roshan Naik <ro...@hortonworks.com>.
it may be better to make this async behavior a policy of the spillable
channel rather than introduce a new channel.
-roshan

On Mon, Sep 8, 2014 at 7:50 AM, Luiz Geovani Vier <lg...@gmail.com> wrote:

> Hello, dear Flume devs,
>
> I'm currently experimenting with using flume's embedded agent to relay
> ~10MB/s worth of events to another set of flume agents.
> There won't be much memory available, so the SpillableMemoryChannel looked
> like a good alternative, however:
>
> * This agent needs to receive the events in a "fire and forget" approach so
> that it doesn't impact the application's performance.
> When the SpillableMemoryChannel starts spilling to disk, the performance is
> significantly impaired due to the synchronous I/O calls.
> * If the channel becomes full, I'd like to discard older events instead of
> rejecting new ones, as the current events are more important in this case
> (application/usage metrics).
> * When the SpillableMemoryChannel contains a lot of data on disk, it takes
> several minutes to become available after a restart, preventing Flume (and
> in this case the application that is embedding it as well) from accepting
> events during this period.
>
> With that in mind, I started writing a new channel which is basically a
> MemoryChannel with a background thread that starts moving events into a
> FileChannel before the MemoryChannel becomes full.
> It also starts the FileChannel in background, so Flume can begin accepting
> events into its MemoryChannel immediately.
> Moreover, it can discard older events from the FileChannel if necessary to
> accommodate new events spilling from the MemoryChannel.
>
> The code can be found here:
> Project: https://github.com/lgvier/flume-async-spillable-mem-channel
> Class:
>
> https://github.com/lgvier/flume-async-spillable-mem-channel/blob/master/src/main/java/org/apache/flume/channel/AsyncSpillableMemoryChannel.java
>
> I'd appreciate your input on it very much.
> Does it seem like a good approach?
> Would there be a better solution for this scenario? I'm also considering
> using the Flume RPC client with third-party queueing mechanisms, but would
> prefer an end-to-end flume solution.
> Is it useful for the Flume community?
>
> Thank you,
> -Geovani
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.