You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Sverre Bakke <sv...@gmail.com> on 2015/01/29 16:47:57 UTC

How to handle ChannelFullException

Hi,

I have a syslogtcp source using a default memory channel and Kafka
sink. When producing data as fast as possible (3000 syslog events in a
second), the source seems to accept all the data, but will crash due
to ChannelFullException when adding the event to the channel.

Is there any way to throttle or otherwise wait receiving more syslog
events before channel is available again rather than crashing because
the channel is full? I would prefer that Flume would accept syslog
events slower rather than crashing and dropping events.

29 Jan 2015 16:26:56,721 ERROR [New I/O  worker #2]
(org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.messageReceived:94)
 - Error writting to channel, event dropped

Also, the syslogtcp seems to keep the syslog headers regardless of the
keepFields setting, is there any common reason for why this might
happen? In contrast, the multiport syslog tcp listener works as
expected with this particular setting.

Re: How to handle ChannelFullException

Posted by Sverre Bakke <sv...@gmail.com>.
Hi,

I currently have only a single sink. I actually did not know I could
have several sinks attached to the same channel. I will try this as
well as increasing the channel size and see if I get rid of these
channelfullexceptions.

On Thu, Jan 29, 2015 at 8:45 PM, Hari Shreedharan
<hs...@cloudera.com> wrote:
> How many sinks do you have? Adding more sinks increases parallelism and will
> clear the channel faster, provided the downstream system can handle the
> load.
>
> Thanks,
> Hari
>
>
> On Thu, Jan 29, 2015 at 9:41 AM, Sverre Bakke <sv...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Thanks for your feedback. I can of course switch to the multiport one
>> if the plain one is not maintained.
>>
>> Back to the ChannelFullException issue. I can increase the channel
>> size, but the basic problem remains.. as long as the syslog client is
>> faster than the Flume sink, then this exception will occur and data
>> would be lost... I really believe that blocking so that the syslog
>> client must wait to send more data is the way to go for a robust
>> solution.
>>
>> Lets assume that the syslog client reads batches of events e.g. from
>> file and send these as fast as possible to the Flume multiport tcp
>> syslog source. In such cases, the average event per second rate would
>> be medium, while in practice, there would be huge spikes where the
>> client would deliver as fast as possible. Instead of asking the client
>> to "slow down", Flume would accept the events and drop them. This
>> forces me as an admin to monitor the logs and try to guess which
>> events were dropped. If this happens, I can have a reliable and
>> persistent channel configured, but events will still be dropped thus
>> undermining the entire solution.
>>
>>
>>
>> On Thu, Jan 29, 2015 at 4:56 PM, Jeff Lord <jl...@cloudera.com> wrote:
>> > Have you considered increasing the size of the memory channel? I haven't
>> > played with Kafka sink much but in regards to hdfs we often add sinks
>> > which
>> > can help to increase the flow of the channel.
>> > The multi port Syslog source is the way to go here as it will give
>> > better
>> > performance. We should probably go ahead and deprecate the vanilla
>> > syslog
>> > source.
>> >
>> >
>> > On Thursday, January 29, 2015, Sverre Bakke <sv...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have a syslogtcp source using a default memory channel and Kafka
>> >> sink. When producing data as fast as possible (3000 syslog events in a
>> >> second), the source seems to accept all the data, but will crash due
>> >> to ChannelFullException when adding the event to the channel.
>> >>
>> >> Is there any way to throttle or otherwise wait receiving more syslog
>> >> events before channel is available again rather than crashing because
>> >> the channel is full? I would prefer that Flume would accept syslog
>> >> events slower rather than crashing and dropping events.
>> >>
>> >> 29 Jan 2015 16:26:56,721 ERROR [New I/O worker #2]
>> >>
>> >>
>> >> (org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.messageReceived:94)
>> >> - Error writting to channel, event dropped
>> >>
>> >> Also, the syslogtcp seems to keep the syslog headers regardless of the
>> >> keepFields setting, is there any common reason for why this might
>> >> happen? In contrast, the multiport syslog tcp listener works as
>> >> expected with this particular setting.
>
>

Re: How to handle ChannelFullException

Posted by Hari Shreedharan <hs...@cloudera.com>.
How many sinks do you have? Adding more sinks increases parallelism and will clear the channel faster, provided the downstream system can handle the load.




Thanks, Hari

On Thu, Jan 29, 2015 at 9:41 AM, Sverre Bakke <sv...@gmail.com>
wrote:

> Hi,
> Thanks for your feedback. I can of course switch to the multiport one
> if the plain one is not maintained.
> Back to the ChannelFullException issue. I can increase the channel
> size, but the basic problem remains.. as long as the syslog client is
> faster than the Flume sink, then this exception will occur and data
> would be lost... I really believe that blocking so that the syslog
> client must wait to send more data is the way to go for a robust
> solution.
> Lets assume that the syslog client reads batches of events e.g. from
> file and send these as fast as possible to the Flume multiport tcp
> syslog source. In such cases, the average event per second rate would
> be medium, while in practice, there would be huge spikes where the
> client would deliver as fast as possible. Instead of asking the client
> to "slow down", Flume would accept the events and drop them. This
> forces me as an admin to monitor the logs and try to guess which
> events were dropped. If this happens, I can have a reliable and
> persistent channel configured, but events will still be dropped thus
> undermining the entire solution.
> On Thu, Jan 29, 2015 at 4:56 PM, Jeff Lord <jl...@cloudera.com> wrote:
>> Have you considered increasing the size of the memory channel? I haven't
>> played with Kafka sink much but in regards to hdfs we often add sinks which
>> can help to increase the flow of the channel.
>> The multi port Syslog source is the way to go here as it will give better
>> performance. We should probably go ahead and deprecate the vanilla syslog
>> source.
>>
>>
>> On Thursday, January 29, 2015, Sverre Bakke <sv...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a syslogtcp source using a default memory channel and Kafka
>>> sink. When producing data as fast as possible (3000 syslog events in a
>>> second), the source seems to accept all the data, but will crash due
>>> to ChannelFullException when adding the event to the channel.
>>>
>>> Is there any way to throttle or otherwise wait receiving more syslog
>>> events before channel is available again rather than crashing because
>>> the channel is full? I would prefer that Flume would accept syslog
>>> events slower rather than crashing and dropping events.
>>>
>>> 29 Jan 2015 16:26:56,721 ERROR [New I/O  worker #2]
>>>
>>> (org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.messageReceived:94)
>>>  - Error writting to channel, event dropped
>>>
>>> Also, the syslogtcp seems to keep the syslog headers regardless of the
>>> keepFields setting, is there any common reason for why this might
>>> happen? In contrast, the multiport syslog tcp listener works as
>>> expected with this particular setting.

Re: How to handle ChannelFullException

Posted by Sverre Bakke <sv...@gmail.com>.
Hi,

Thanks for your feedback. I can of course switch to the multiport one
if the plain one is not maintained.

Back to the ChannelFullException issue. I can increase the channel
size, but the basic problem remains.. as long as the syslog client is
faster than the Flume sink, then this exception will occur and data
would be lost... I really believe that blocking so that the syslog
client must wait to send more data is the way to go for a robust
solution.

Lets assume that the syslog client reads batches of events e.g. from
file and send these as fast as possible to the Flume multiport tcp
syslog source. In such cases, the average event per second rate would
be medium, while in practice, there would be huge spikes where the
client would deliver as fast as possible. Instead of asking the client
to "slow down", Flume would accept the events and drop them. This
forces me as an admin to monitor the logs and try to guess which
events were dropped. If this happens, I can have a reliable and
persistent channel configured, but events will still be dropped thus
undermining the entire solution.



On Thu, Jan 29, 2015 at 4:56 PM, Jeff Lord <jl...@cloudera.com> wrote:
> Have you considered increasing the size of the memory channel? I haven't
> played with Kafka sink much but in regards to hdfs we often add sinks which
> can help to increase the flow of the channel.
> The multi port Syslog source is the way to go here as it will give better
> performance. We should probably go ahead and deprecate the vanilla syslog
> source.
>
>
> On Thursday, January 29, 2015, Sverre Bakke <sv...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have a syslogtcp source using a default memory channel and Kafka
>> sink. When producing data as fast as possible (3000 syslog events in a
>> second), the source seems to accept all the data, but will crash due
>> to ChannelFullException when adding the event to the channel.
>>
>> Is there any way to throttle or otherwise wait receiving more syslog
>> events before channel is available again rather than crashing because
>> the channel is full? I would prefer that Flume would accept syslog
>> events slower rather than crashing and dropping events.
>>
>> 29 Jan 2015 16:26:56,721 ERROR [New I/O  worker #2]
>>
>> (org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.messageReceived:94)
>>  - Error writting to channel, event dropped
>>
>> Also, the syslogtcp seems to keep the syslog headers regardless of the
>> keepFields setting, is there any common reason for why this might
>> happen? In contrast, the multiport syslog tcp listener works as
>> expected with this particular setting.

Re: How to handle ChannelFullException

Posted by Jeff Lord <jl...@cloudera.com>.
Have you considered increasing the size of the memory channel? I haven't
played with Kafka sink much but in regards to hdfs we often add sinks which
can help to increase the flow of the channel.
The multi port Syslog source is the way to go here as it will give better
performance. We should probably go ahead and deprecate the vanilla syslog
source.

On Thursday, January 29, 2015, Sverre Bakke <sv...@gmail.com> wrote:

> Hi,
>
> I have a syslogtcp source using a default memory channel and Kafka
> sink. When producing data as fast as possible (3000 syslog events in a
> second), the source seems to accept all the data, but will crash due
> to ChannelFullException when adding the event to the channel.
>
> Is there any way to throttle or otherwise wait receiving more syslog
> events before channel is available again rather than crashing because
> the channel is full? I would prefer that Flume would accept syslog
> events slower rather than crashing and dropping events.
>
> 29 Jan 2015 16:26:56,721 ERROR [New I/O  worker #2]
>
> (org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.messageReceived:94)
>  - Error writting to channel, event dropped
>
> Also, the syslogtcp seems to keep the syslog headers regardless of the
> keepFields setting, is there any common reason for why this might
> happen? In contrast, the multiport syslog tcp listener works as
> expected with this particular setting.
>