You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by alo alt <wg...@googlemail.com> on 2012/02/15 09:06:05 UTC

Re: Scale of a flume collector

Hi,

that depends on the sink you want to use. Lets say you use E2E chains, the collectors are on actual hardware and you use compression I would put 10 agents per collector (180mb/s * 60 (for a minute based file closing) = 10.8 GB / min). To get closer on RT I would suggest a 10 sec roll, but more as 10 could be create a bottleneck at peak times
The collectors need fast hard disks. 


best,
 Alex 

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 14, 2012, at 8:25 PM, Kim, Jongkook wrote:

> Hi all.
> 
> I'm in the middle of hardware provisioning for flume-hbase-hadoop solution.
> The plan is that flume agents collect and pass log data to collectors and the collectors write data into hbase using sink.
> The question is a flume collector's scale.
> 
> Flume agents:250
> Data receiving ratio: 5.78MB/second
> Data writing ratio: 17.9MB/second
> Number of data nodes: 12
> 
> This system will be used to provide real-time use case, so there shouldn't be delay.
> How many collectors required to handle this request?
> 
> Thanks in advance,


Re: Scale of a flume collector

Posted by alo alt <wg...@googlemail.com>.
Hi Kim,

around, that can handle a collector, based on modern hardware and a well switched network. The throughput to HDFS depends on other variables on your hadoop cluster.

Flume has the opportunity to compress the data before they are written, that was my intention. Compressing costs time and CPU, but if you don't use compression inside of a collector 10 agents for one collector should be okay. Here you should add some spares to prevent a server crash, I used in past autoCollectorSource.

best,
 Alex     

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 15, 2012, at 5:52 PM, Kim, Jongkook wrote:

> Thanks Alex,
> 
> When you says "180mb/s", is it data handling capacity of one collector?
> The size of data that I listed on the email is not compressed size and we are using DFO. 
> If the data is compressed, do we still need 1 collector for 10 agents?
> 
> Thanks in advance,
> 
> 
> -----Original Message-----
> From: alo alt [mailto:wget.null@googlemail.com] 
> Sent: Wednesday, February 15, 2012 3:06 AM
> To: flume-user@incubator.apache.org
> Subject: Re: Scale of a flume collector
> 
> Hi,
> 
> that depends on the sink you want to use. Lets say you use E2E chains, the collectors are on actual hardware and you use compression I would put 10 agents per collector (180mb/s * 60 (for a minute based file closing) = 10.8 GB / min). To get closer on RT I would suggest a 10 sec roll, but more as 10 could be create a bottleneck at peak times
> The collectors need fast hard disks. 
> 
> 
> best,
> Alex 
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Feb 14, 2012, at 8:25 PM, Kim, Jongkook wrote:
> 
>> Hi all.
>> 
>> I'm in the middle of hardware provisioning for flume-hbase-hadoop solution.
>> The plan is that flume agents collect and pass log data to collectors and the collectors write data into hbase using sink.
>> The question is a flume collector's scale.
>> 
>> Flume agents:250
>> Data receiving ratio: 5.78MB/second
>> Data writing ratio: 17.9MB/second
>> Number of data nodes: 12
>> 
>> This system will be used to provide real-time use case, so there shouldn't be delay.
>> How many collectors required to handle this request?
>> 
>> Thanks in advance,
> 


RE: Scale of a flume collector

Posted by "Kim, Jongkook " <jo...@citi.com>.
Thanks Alex,

When you says "180mb/s", is it data handling capacity of one collector?
The size of data that I listed on the email is not compressed size and we are using DFO. 
If the data is compressed, do we still need 1 collector for 10 agents?

Thanks in advance,


-----Original Message-----
From: alo alt [mailto:wget.null@googlemail.com] 
Sent: Wednesday, February 15, 2012 3:06 AM
To: flume-user@incubator.apache.org
Subject: Re: Scale of a flume collector

Hi,

that depends on the sink you want to use. Lets say you use E2E chains, the collectors are on actual hardware and you use compression I would put 10 agents per collector (180mb/s * 60 (for a minute based file closing) = 10.8 GB / min). To get closer on RT I would suggest a 10 sec roll, but more as 10 could be create a bottleneck at peak times
The collectors need fast hard disks. 


best,
 Alex 

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 14, 2012, at 8:25 PM, Kim, Jongkook wrote:

> Hi all.
> 
> I'm in the middle of hardware provisioning for flume-hbase-hadoop solution.
> The plan is that flume agents collect and pass log data to collectors and the collectors write data into hbase using sink.
> The question is a flume collector's scale.
> 
> Flume agents:250
> Data receiving ratio: 5.78MB/second
> Data writing ratio: 17.9MB/second
> Number of data nodes: 12
> 
> This system will be used to provide real-time use case, so there shouldn't be delay.
> How many collectors required to handle this request?
> 
> Thanks in advance,