You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by liuyongbo <li...@baidu.com> on 2013/05/15 07:37:30 UTC

how to print the channel capacity

Hi:

         I'm using flume to pass log data to mongodb, but I find that some
data lose when the pressure is in high level, so I want to know the max
request that flume can hold and need to print the capacity.but I can not
find the proper way to do this instead of change the source code. Any ideas?

         thanks


Re: 答复: how to print the channel capacity

Posted by Nitin Pawar <ni...@gmail.com>.
instead of memory channel .. can you try file channel?

i think when you say exact point that can balance input and output .. you
want to figure out how many events can the memory channel buffer before you
start losing the events .. is that correct ?

from http://flume.apache.org/FlumeUserGuide.html#memory-channel
capacity100The max number of events stored in the channeltransactionCapacity
100The max number of events stored in the channel per
transactionkeep-alive3Timeout
in seconds for adding or removing an event




On Wed, May 15, 2013 at 5:09 PM, liuyongbo <li...@baidu.com> wrote:

>  Thanks for your answer.****
>
> Additional,I’m using mem channel, write log to mongodb, when the input log
> is faster than consume(write into mongo), the queue is growing, when reach
> the max,the new input log is lost.****
>
> So, what I want to know is the exact point that can blance the input and
> output****
>
> ** **
>
> *发件人:* Nitin Pawar [mailto:nitinpawar432@gmail.com]
> *发送时间:* 2013年5月15日 16:49
> *收件人:* user@flume.apache.org
> *主题:* Re: how to print the channel capacity****
>
> ** **
>
> here is one example for the capacity defining flow ****
>
> https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html****
>
> ** **
>
> On Wed, May 15, 2013 at 2:16 PM, Nitin Pawar <ni...@gmail.com>
> wrote:****
>
> sorry pressed enter too soon ****
>
> ** **
>
> as for your question: how many events a flume agent can hold? ****
>
> sorry but I don't think there is any direct answer to that.... .I may be
> very well wrong there as I am myself pretty new with flume ****
>
> ** **
>
> there was a JIRA for the capacity of file channels FLUME-1571****
>
> ** **
>
> On Wed, May 15, 2013 at 1:50 PM, Nitin Pawar <ni...@gmail.com>
> wrote:****
>
> for maximum performance on your data flow two things which will matter
> most are: the channel and the transaction batch size.****
>
> when you say losing data, are you using memory channel? or file channel? *
> ***
>
> ** **
>
> Flume can batch events. The batch size is the maximum number of events
> that a sink or client will attempt to take from a channel in a single
> transaction.****
>
> ** **
>
>  What is the channel type****
>
> do you have a slow sink so the # events written out are less than # event
> incoming to channels so over time it piles up ****
>
> ** **
>
> others may point out more things. ****
>
> Also your flume conf and if you are seeing any errors on flume then that
> will help people to find out the problem ****
>
> ** **
>
> On Wed, May 15, 2013 at 11:07 AM, liuyongbo <li...@baidu.com> wrote:**
> **
>
> Hi:****
>
>          I’m using flume to pass log data to mongodb, but I find that some
> data lose when the pressure is in high level, so I want to know the max
> request that flume can hold and need to print the capacity.but I can not
> find the proper way to do this instead of change the source code. Any ideas?
> ****
>
>          thanks****
>
>
>
> ****
>
> ** **
>
> --
> Nitin Pawar****
>
>
>
> ****
>
> ** **
>
> --
> Nitin Pawar****
>
>
>
> ****
>
> ** **
>
> --
> Nitin Pawar****
>



-- 
Nitin Pawar

答复: how to print the channel capacity

Posted by liuyongbo <li...@baidu.com>.
Thanks for your answer.

Additional,I’m using mem channel, write log to mongodb, when the input log
is faster than consume(write into mongo), the queue is growing, when reach
the max,the new input log is lost.

So, what I want to know is the exact point that can blance the input and
output

 

发件人: Nitin Pawar [mailto:nitinpawar432@gmail.com] 
发送时间: 2013年5月15日 16:49
收件人: user@flume.apache.org
主题: Re: how to print the channel capacity

 

here is one example for the capacity defining flow 

https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html

 

On Wed, May 15, 2013 at 2:16 PM, Nitin Pawar <ni...@gmail.com>
wrote:

sorry pressed enter too soon 

 

as for your question: how many events a flume agent can hold? 

sorry but I don't think there is any direct answer to that.... .I may be
very well wrong there as I am myself pretty new with flume 

 

there was a JIRA for the capacity of file channels FLUME-1571

 

On Wed, May 15, 2013 at 1:50 PM, Nitin Pawar <ni...@gmail.com>
wrote:

for maximum performance on your data flow two things which will matter most
are: the channel and the transaction batch size.

when you say losing data, are you using memory channel? or file channel? 

 

Flume can batch events. The batch size is the maximum number of events that
a sink or client will attempt to take from a channel in a single
transaction.

 

 What is the channel type

do you have a slow sink so the # events written out are less than # event
incoming to channels so over time it piles up 

 

others may point out more things. 

Also your flume conf and if you are seeing any errors on flume then that
will help people to find out the problem 

 

On Wed, May 15, 2013 at 11:07 AM, liuyongbo <li...@baidu.com> wrote:

Hi:

         I’m using flume to pass log data to mongodb, but I find that some
data lose when the pressure is in high level, so I want to know the max
request that flume can hold and need to print the capacity.but I can not
find the proper way to do this instead of change the source code. Any ideas?

         thanks





 

-- 
Nitin Pawar





 

-- 
Nitin Pawar





 

-- 
Nitin Pawar


Re: how to print the channel capacity

Posted by Nitin Pawar <ni...@gmail.com>.
here is one example for the capacity defining flow
https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html


On Wed, May 15, 2013 at 2:16 PM, Nitin Pawar <ni...@gmail.com>wrote:

> sorry pressed enter too soon
>
> as for your question: how many events a flume agent can hold?
> sorry but I don't think there is any direct answer to that.... .I may be
> very well wrong there as I am myself pretty new with flume
>
> there was a JIRA for the capacity of file channels FLUME-1571
>
>
> On Wed, May 15, 2013 at 1:50 PM, Nitin Pawar <ni...@gmail.com>wrote:
>
>> for maximum performance on your data flow two things which will matter
>> most are: the channel and the transaction batch size.
>> when you say losing data, are you using memory channel? or file channel?
>>
>> Flume can batch events. The batch size is the maximum number of events
>> that a sink or client will attempt to take from a channel in a single
>> transaction.
>>
>>  What is the channel type
>> do you have a slow sink so the # events written out are less than # event
>> incoming to channels so over time it piles up
>>
>> others may point out more things.
>> Also your flume conf and if you are seeing any errors on flume then that
>> will help people to find out the problem
>>
>>
>> On Wed, May 15, 2013 at 11:07 AM, liuyongbo <li...@baidu.com> wrote:
>>
>>>  Hi:****
>>>
>>>          I’m using flume to pass log data to mongodb, but I find that
>>> some data lose when the pressure is in high level, so I want to know the
>>> max request that flume can hold and need to print the capacity.but I can
>>> not find the proper way to do this instead of change the source code. Any
>>> ideas?****
>>>
>>>          thanks****
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: how to print the channel capacity

Posted by Nitin Pawar <ni...@gmail.com>.
sorry pressed enter too soon

as for your question: how many events a flume agent can hold?
sorry but I don't think there is any direct answer to that.... .I may be
very well wrong there as I am myself pretty new with flume

there was a JIRA for the capacity of file channels FLUME-1571


On Wed, May 15, 2013 at 1:50 PM, Nitin Pawar <ni...@gmail.com>wrote:

> for maximum performance on your data flow two things which will matter
> most are: the channel and the transaction batch size.
> when you say losing data, are you using memory channel? or file channel?
>
> Flume can batch events. The batch size is the maximum number of events
> that a sink or client will attempt to take from a channel in a single
> transaction.
>
>  What is the channel type
> do you have a slow sink so the # events written out are less than # event
> incoming to channels so over time it piles up
>
> others may point out more things.
> Also your flume conf and if you are seeing any errors on flume then that
> will help people to find out the problem
>
>
> On Wed, May 15, 2013 at 11:07 AM, liuyongbo <li...@baidu.com> wrote:
>
>>  Hi:****
>>
>>          I’m using flume to pass log data to mongodb, but I find that
>> some data lose when the pressure is in high level, so I want to know the
>> max request that flume can hold and need to print the capacity.but I can
>> not find the proper way to do this instead of change the source code. Any
>> ideas?****
>>
>>          thanks****
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: how to print the channel capacity

Posted by Nitin Pawar <ni...@gmail.com>.
for maximum performance on your data flow two things which will matter most
are: the channel and the transaction batch size.
when you say losing data, are you using memory channel? or file channel?

Flume can batch events. The batch size is the maximum number of events that
a sink or client will attempt to take from a channel in a single
transaction.

 What is the channel type
do you have a slow sink so the # events written out are less than # event
incoming to channels so over time it piles up

others may point out more things.
Also your flume conf and if you are seeing any errors on flume then that
will help people to find out the problem


On Wed, May 15, 2013 at 11:07 AM, liuyongbo <li...@baidu.com> wrote:

>  Hi:****
>
>          I’m using flume to pass log data to mongodb, but I find that some
> data lose when the pressure is in high level, so I want to know the max
> request that flume can hold and need to print the capacity.but I can not
> find the proper way to do this instead of change the source code. Any ideas?
> ****
>
>          thanks****
>



-- 
Nitin Pawar

答复: how to print the channel capacity

Posted by liuyongbo <li...@baidu.com>.
Thank u very much. The monitor is useful.

 

发件人: Paul Chavez [mailto:pchavez@verticalsearchworks.com] 
发送时间: 2013年5月16日 0:36
收件人: user@flume.apache.org
主题: RE: how to print the channel capacity 

 

There are a few ways to monitor flume in operation. We use the JSON
reporting, which is available via 'http://<agent address>:<port>/metrics'.
You need to start the agent with the following parameters to get this
interface: 

-Dflume.monitoring.type=http -Dflume.monitoring.port=34545

We use cacti to graph channel size both as a percentage of maximum and
absolute number of events in channel. This provides warning if the sinks
cannot keep up with the sources. 

 

We also graph ingress/egress event counts, much like a network bandwidth
graph, for some channels to get an idea of the throughput and to see if
sources/sinks are running at same speed.

  _____  

From: liuyongbo [mailto:liuyongbo@baidu.com] 
Sent: Tuesday, May 14, 2013 10:38 PM
To: user@flume.apache.org
Subject: how to print the channel capacity 

Hi:

         I’m using flume to pass log data to mongodb, but I find that some
data lose when the pressure is in high level, so I want to know the max
request that flume can hold and need to print the capacity.but I can not
find the proper way to do this instead of change the source code. Any ideas?

         thanks


Re: how to print the channel capacity

Posted by Matt Wise <ma...@nextdoor.com>.
http://engblog.nextdoor.com/post/50507841273/apache-flume-performance-monitoring

--Matt

On May 15, 2013, at 10:10 AM, Matt Wise <ma...@nextdoor.com> wrote:

> We do the same thing, but with Collectd as our graphing/collection mechanism. I am actually going to do a blog post in the next day or two with the code to our flume data collection script, and some example graphs/etc. We've done a similar thing with Zookeeper monitoring (http://engblog.nextdoor.com/post/49942956311/apache-zookeeper-performance-monitoring). 
> 
> --Matt
> 
> On May 15, 2013, at 9:36 AM, Paul Chavez <pc...@verticalsearchworks.com> wrote:
> 
>> There are a few ways to monitor flume in operation. We use the JSON reporting, which is available via 'http://<agent address>:<port>/metrics'. You need to start the agent with the following parameters to get this interface:
>> -Dflume.monitoring.type=http -Dflume.monitoring.port=34545
>> We use cacti to graph channel size both as a percentage of maximum and absolute number of events in channel. This provides warning if the sinks cannot keep up with the sources. 
>>  
>> We also graph ingress/egress event counts, much like a network bandwidth graph, for some channels to get an idea of the throughput and to see if sources/sinks are running at same speed.
>> From: liuyongbo [mailto:liuyongbo@baidu.com] 
>> Sent: Tuesday, May 14, 2013 10:38 PM
>> To: user@flume.apache.org
>> Subject: how to print the channel capacity 
>> 
>> Hi:
>>          I’m using flume to pass log data to mongodb, but I find that some data lose when the pressure is in high level, so I want to know the max request that flume can hold and need to print the capacity.but I can not find the proper way to do this instead of change the source code. Any ideas?
>>          thanks
> 


Re: how to print the channel capacity

Posted by Matt Wise <ma...@nextdoor.com>.
We do the same thing, but with Collectd as our graphing/collection mechanism. I am actually going to do a blog post in the next day or two with the code to our flume data collection script, and some example graphs/etc. We've done a similar thing with Zookeeper monitoring (http://engblog.nextdoor.com/post/49942956311/apache-zookeeper-performance-monitoring). 

--Matt

On May 15, 2013, at 9:36 AM, Paul Chavez <pc...@verticalsearchworks.com> wrote:

> There are a few ways to monitor flume in operation. We use the JSON reporting, which is available via 'http://<agent address>:<port>/metrics'. You need to start the agent with the following parameters to get this interface:
> -Dflume.monitoring.type=http -Dflume.monitoring.port=34545
> We use cacti to graph channel size both as a percentage of maximum and absolute number of events in channel. This provides warning if the sinks cannot keep up with the sources. 
>  
> We also graph ingress/egress event counts, much like a network bandwidth graph, for some channels to get an idea of the throughput and to see if sources/sinks are running at same speed.
> From: liuyongbo [mailto:liuyongbo@baidu.com] 
> Sent: Tuesday, May 14, 2013 10:38 PM
> To: user@flume.apache.org
> Subject: how to print the channel capacity 
> 
> Hi:
>          I’m using flume to pass log data to mongodb, but I find that some data lose when the pressure is in high level, so I want to know the max request that flume can hold and need to print the capacity.but I can not find the proper way to do this instead of change the source code. Any ideas?
>          thanks


RE: how to print the channel capacity

Posted by Paul Chavez <pc...@verticalsearchworks.com>.
There are a few ways to monitor flume in operation. We use the JSON reporting, which is available via 'http://<agent address>:<port>/metrics'. You need to start the agent with the following parameters to get this interface:

-Dflume.monitoring.type=http -Dflume.monitoring.port=34545

We use cacti to graph channel size both as a percentage of maximum and absolute number of events in channel. This provides warning if the sinks cannot keep up with the sources.

We also graph ingress/egress event counts, much like a network bandwidth graph, for some channels to get an idea of the throughput and to see if sources/sinks are running at same speed.
________________________________
From: liuyongbo [mailto:liuyongbo@baidu.com]
Sent: Tuesday, May 14, 2013 10:38 PM
To: user@flume.apache.org
Subject: how to print the channel capacity

Hi:
         I'm using flume to pass log data to mongodb, but I find that some data lose when the pressure is in high level, so I want to know the max request that flume can hold and need to print the capacity.but I can not find the proper way to do this instead of change the source code. Any ideas?
         thanks