You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Borja Garrido <bo...@cern.ch> on 2016/05/24 12:54:52 UTC

Flume keeps files in file channel but doesn't seem to be sending anything

Hi all,

I've been experiencing a really weird behavior with Flume, basically my 
sinks weren't working so data started accumulating in the file channels, 
which caused them to grow in number of files.

When I detect that I stop the agent, stop the source and tried to start 
it so I could drain the channel, but I saw log replaying skipping the 
events.

After some read I move the checkpoint folder (with the agent stopped), 
so it will be empty in the next start, then the replay started taking 
into account the old log files in the channel, but ended up creating a 
new one and not doing anything with the rest, so right now I have around 
20 log file in the channels that weight 1.6 GB each and Flume is not 
taking care of them apparently.

Of course for the replay to work I needed to increase the 
transactionCapacity of the channel

agent-hdfssink.channels.cn.type = file
agent-hdfssink.channels.cn.checkpointDir = /var/spool/flume/n/checkpoint
agent-hdfssink.channels.cn.dataDirs = /var/spool/flume/ln/data
agent-hdfssink.channels.cn.transactionCapacity = 1000
agent-hdfssink.channels.cn.capacity = 6000000

The kind of sink I'm using is HDFS, my question is if this is normal 
behavior and if there is any way to make flume send this data, as it 
seems it doesn't take care of the older log files.

I've also made a try moving everything outside of the channel and just 
letting there a file with its metadata (same result) and no errors in 
any case :S.

Thanks in advance for any help
Cheers,
Borja

Re: Flume keeps files in file channel but doesn't seem to be sending anything

Posted by Attila Simon <sa...@cloudera.com>.
Hi Borja,

I would need more information to confirm this but at first glance this
sounds like a destination related issue (I suspect that sink cannot commit
tx as write fails). Are you sure that HDFS would otherwise be able to
accept data from your flume sink?

How does your sink/source configuration look like? Do you have an excerpt
of the log about the skipped events?

Cheers,
Attila


*Attila Simon*
Software Engineer
Email:   sati@cloudera.com

[image: Cloudera Inc.]

On Tue, May 24, 2016 at 5:54 AM, Borja Garrido <bo...@cern.ch>
wrote:

> Hi all,
>
> I've been experiencing a really weird behavior with Flume, basically my
> sinks weren't working so data started accumulating in the file channels,
> which caused them to grow in number of files.
>
> When I detect that I stop the agent, stop the source and tried to start it
> so I could drain the channel, but I saw log replaying skipping the events.
>
> After some read I move the checkpoint folder (with the agent stopped), so
> it will be empty in the next start, then the replay started taking into
> account the old log files in the channel, but ended up creating a new one
> and not doing anything with the rest, so right now I have around 20 log
> file in the channels that weight 1.6 GB each and Flume is not taking care
> of them apparently.
>
> Of course for the replay to work I needed to increase the
> transactionCapacity of the channel
>
> agent-hdfssink.channels.cn.type = file
> agent-hdfssink.channels.cn.checkpointDir = /var/spool/flume/n/checkpoint
> agent-hdfssink.channels.cn.dataDirs = /var/spool/flume/ln/data
> agent-hdfssink.channels.cn.transactionCapacity = 1000
> agent-hdfssink.channels.cn.capacity = 6000000
>
> The kind of sink I'm using is HDFS, my question is if this is normal
> behavior and if there is any way to make flume send this data, as it seems
> it doesn't take care of the older log files.
>
> I've also made a try moving everything outside of the channel and just
> letting there a file with its metadata (same result) and no errors in any
> case :S.
>
> Thanks in advance for any help
> Cheers,
> Borja
>