You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Juhani Connolly <ju...@cyberagent.co.jp> on 2012/07/13 03:51:34 UTC

Batching events from event driven sources?

Hi,

Raymond's mail to user@flume "performance on RecoverableMemoryChannel vs 
JdbcChannel" got me thinking about how to deal with batching events from 
any type of event driven source. Since we don't have control of when the 
events arrive, they will generally trickle in one at a time.

I have a couple of half-baked ideas to resolve this:

- Logic in the source to keep a transaction open over a few requests... 
Breaks current conventions(since at the end of the request, the event 
hasn't been committed to the channel). Then again in SyslogSource, there 
is no way to communicate to the sender that it wasn't inserted properly, 
so the risk of loss doesn't change much.
- Add some minimum batch size setting to FileChannel, which delays 
flushes until either a) a certain delay since the last one or b) x 
events have been reached.
- Create a BatchingChannel... Basically configure it with a child, and 
it will do receive puts, storing them in memory. After a configured 
number of events are stored, it puts them to the configured child 
channel. This would again be allowing for the loss of up to the 
configured number of events, but no more.

Any other alternatives/ideas? None of the above feel 100% satisfactory 
to me, though I think we will have to make some compromise if we want to 
allow for decent performance between event driven sources and file channel.