You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Brandon Bell <re...@gmail.com> on 2011/12/07 20:49:05 UTC

Re: Flume Questions

Hello,

I am currently exploring Flume as a replacement for a home grown log
transport agent.  I've been playing around with it a bit, but have a few
questions.

1.  How well does Flume perform in a multi-datacenter environment?
2.  How well does a single collector perform (with a second for failover)?
Or should the two be utilized and have the other as failover?
     a. Approximately 10 GB of compressed data is loaded daily into HDFS.
It starts uncompressed and our current process compresses these files prior
to sending.  I believe Flume can do this.
     b. How many agents can a single collector handle?   Currently there
are approx. 100 machines sending our data and is expected to be even more
in the coming year.
3.  For fault tolerance, we would require multi-master.  How well would 3
masters work if they were all located in different DCs?
     a.  Preferably they would all sit in the destination DC along with the
collector(s) and Hadoop cluster because that's where Zookeeper is.
     b.  If network drops for whatever reason between an agent not in the
same DC as the master, what happens with the agents?  Do they lose state or
backup on disk until network comes back up.
4.  Can the agents clean up after themselves?  Or would this require a
custom script using 'exec' that would essentially 'cat' a file and then
remove it?
     a.  If sending over the Internet, what security options do we have?

Thanks,
Brandon