You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Blade Liu <ha...@gmail.com> on 2014/09/25 07:53:32 UTC

Performance of Flume in production systems

Hi,

I'm going to deploy Flume in production systems, but a little worried about
its performance in real-world environment. Could anyone tell me about
Flume's actual performance in production environment? say, if Flume can
deal with 20,000 events per second from a single source(and what about
100-200 sources with one final HDFS sink).

In addition, to reach good performance of tens of thousands of events per
second, how many servers(agents) should be used?  More agents(and more
tiers), better performance?

Thanks very much for your suggestions.


Cheers,
Blade

Re: Performance of Flume in production systems

Posted by Roshan Naik <ro...@hortonworks.com>.
imo, the channel will most likely play a huge role in your perf. the perf
difference between mem channel and file channel is quite large.

On Thu, Sep 25, 2014 at 8:53 PM, Blade Liu <ha...@gmail.com> wrote:

> Hi Asim and Jeff,
>
> Thanks for your nice suggestions.  I found two excellent articles, one on
> performance test and the other on deployment design/optimization in
> production system.
>
> Flume NG Performance Measurements
>
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements
>
> Log collection system architecture and design of Meituan.com(Chinese
> version, strongly recommended that you can use Chrome translator for
> reading)
> http://tech.meituan.com/mt-log-system-arch.html
> http://tech.meituan.com/mt-log-system-optimization.html
>
> I guess building a stable and efficient collection system is  challenging
> and also fun.
>
>
> Cheers,
> Blade
>
> 2014-09-26 3:15 GMT+08:00 Jeff Lord <jl...@cloudera.com>:
>
>> Whether or not flume can handle 20k eps will depend on several factors.
>> The main ones being:
>> 1. What is the avg size of event
>> 2. What source will you be using
>>
>> With that said I have seen a single flume agent handle well over 20k eps
>> using the multiport syslog source.
>>
>> Here is a link to a presentation given by Arvind Prabhakar on planning a
>> flume deployment.
>>
>> http://goo.gl/FsfmmC
>>
>> -Jeff
>>
>> On Wed, Sep 24, 2014 at 10:53 PM, Blade Liu <ha...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm going to deploy Flume in production systems, but a little worried
>>> about its performance in real-world environment. Could anyone tell me about
>>> Flume's actual performance in production environment? say, if Flume can
>>> deal with 20,000 events per second from a single source(and what about
>>> 100-200 sources with one final HDFS sink).
>>>
>>> In addition, to reach good performance of tens of thousands of events
>>> per second, how many servers(agents) should be used?  More agents(and more
>>> tiers), better performance?
>>>
>>> Thanks very much for your suggestions.
>>>
>>>
>>> Cheers,
>>> Blade
>>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Performance of Flume in production systems

Posted by Blade Liu <ha...@gmail.com>.
Hi Asim and Jeff,

Thanks for your nice suggestions.  I found two excellent articles, one on
performance test and the other on deployment design/optimization in
production system.

Flume NG Performance Measurements
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements

Log collection system architecture and design of Meituan.com(Chinese
version, strongly recommended that you can use Chrome translator for
reading)
http://tech.meituan.com/mt-log-system-arch.html
http://tech.meituan.com/mt-log-system-optimization.html

I guess building a stable and efficient collection system is  challenging
and also fun.


Cheers,
Blade

2014-09-26 3:15 GMT+08:00 Jeff Lord <jl...@cloudera.com>:

> Whether or not flume can handle 20k eps will depend on several factors.
> The main ones being:
> 1. What is the avg size of event
> 2. What source will you be using
>
> With that said I have seen a single flume agent handle well over 20k eps
> using the multiport syslog source.
>
> Here is a link to a presentation given by Arvind Prabhakar on planning a
> flume deployment.
>
> http://goo.gl/FsfmmC
>
> -Jeff
>
> On Wed, Sep 24, 2014 at 10:53 PM, Blade Liu <ha...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm going to deploy Flume in production systems, but a little worried
>> about its performance in real-world environment. Could anyone tell me about
>> Flume's actual performance in production environment? say, if Flume can
>> deal with 20,000 events per second from a single source(and what about
>> 100-200 sources with one final HDFS sink).
>>
>> In addition, to reach good performance of tens of thousands of events per
>> second, how many servers(agents) should be used?  More agents(and more
>> tiers), better performance?
>>
>> Thanks very much for your suggestions.
>>
>>
>> Cheers,
>> Blade
>>
>
>

Re: Performance of Flume in production systems

Posted by Jeff Lord <jl...@cloudera.com>.
Whether or not flume can handle 20k eps will depend on several factors.
The main ones being:
1. What is the avg size of event
2. What source will you be using

With that said I have seen a single flume agent handle well over 20k eps
using the multiport syslog source.

Here is a link to a presentation given by Arvind Prabhakar on planning a
flume deployment.

http://goo.gl/FsfmmC

-Jeff

On Wed, Sep 24, 2014 at 10:53 PM, Blade Liu <ha...@gmail.com> wrote:

> Hi,
>
> I'm going to deploy Flume in production systems, but a little worried
> about its performance in real-world environment. Could anyone tell me about
> Flume's actual performance in production environment? say, if Flume can
> deal with 20,000 events per second from a single source(and what about
> 100-200 sources with one final HDFS sink).
>
> In addition, to reach good performance of tens of thousands of events per
> second, how many servers(agents) should be used?  More agents(and more
> tiers), better performance?
>
> Thanks very much for your suggestions.
>
>
> Cheers,
> Blade
>

Re: Performance of Flume in production systems

Posted by Asim Zafir <as...@gmail.com>.
It really depends but couple of questions before a proper suggestion can be
made. :

What kind of agent are you using in your pipeline sinking to HDFS?
Does your pipeline involves a collector?
What kind of channel you are using accross the data pipeline?
How frequently do you want to roll the flume events?
It will be helpful to see your data pipeline architecture before making a
suggestion?

Asim Zafir

On Wed, Sep 24, 2014 at 10:53 PM, Blade Liu <ha...@gmail.com> wrote:

> Hi,
>
> I'm going to deploy Flume in production systems, but a little worried
> about its performance in real-world environment. Could anyone tell me about
> Flume's actual performance in production environment? say, if Flume can
> deal with 20,000 events per second from a single source(and what about
> 100-200 sources with one final HDFS sink).
>
> In addition, to reach good performance of tens of thousands of events per
> second, how many servers(agents) should be used?  More agents(and more
> tiers), better performance?
>
> Thanks very much for your suggestions.
>
>
> Cheers,
> Blade
>