You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Zhiwen Sun <pe...@gmail.com> on 2013/03/19 09:19:19 UTC

Why used space of flie channel buffer directory increase?

hi all:

I test flume-ng in my local machine. The data flow is :

  tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs

My configuration file is here :

a1.sources = r1
> a1.channels = c2
>
> a1.sources.r1.type = netcat
> a1.sources.r1.bind = 192.168.201.197
> a1.sources.r1.port = 44444
> a1.sources.r1.max-line-length = 1000000
>
> a1.sinks.k1.type = logger
>
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 10000
> a1.channels.c1.transactionCapacity = 10000
>
> a1.channels.c2.type = file
> a1.sources.r1.channels = c2
>
> a1.sources.r1.interceptors = i1
> a1.sources.r1.interceptors.i1.type = timestamp
>
> a1.sinks = k2
> a1.sinks.k2.type = hdfs
> a1.sinks.k2.channel = c2
> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
> a1.sinks.k2.hdfs.writeFormat = Text
> a1.sinks.k2.hdfs.rollInterval = 10
> a1.sinks.k2.hdfs.rollSize = 10000000
> a1.sinks.k2.hdfs.rollCount = 0
>
> a1.sinks.k2.hdfs.filePrefix = app
> a1.sinks.k2.hdfs.fileType = DataStream
>



it seems that events were collected correctly.

But there is a problem boring me: Used space of file channel (~/.flume) has
always increased, even there is no new event.

Is my configuration wrong or other problem?

thanks.


Best regards.

Zhiwen Sun

Re: Why used space of flie channel buffer directory increase?

Posted by Zhiwen Sun <pe...@gmail.com>.
Thanks for your reply.

I will try syslog as source.

Zhiwen Sun



On Wed, Mar 20, 2013 at 3:11 PM, Alexander Alten-Lorenz <wget.null@gmail.com
> wrote:

> HI,
>
> I suspect tail -F and nc for filling up the directory. Whats inside of
> such a file which grows without a event?
>
> My assumption:
> nc is open one stream, and deliver over this stream all incoming events.
> Flume doesn't know that no event is coming in, since the stream never
> breaks up. I wondering if you could use syslog(-ng) for the event delivery?
>
> Cheers,
>  Alex
>
>
>
> On Mar 20, 2013, at 2:30 AM, Zhiwen Sun <pe...@gmail.com> wrote:
>
> > Thanks all for your reply.
> >
> > @Kenison
> > I stop my tail -F | nc program and there is no new event file in HDFS,
> so I think there is no event arrive. To make sure, I will test again with
> enable JMX.
> >
> > @Alex
> >
> > The latest log is following. I can't see any exception or warning.
> >
> > 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
> > 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
> > 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
> sync = 3
> > 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
> queueHead: 362981
> > 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta
> currentPosition = 216278208, logWriteOrderID = 1363659953997
> > 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
> logWriteOrderID: 1363659953997
> > 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
> > 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
> > 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
> > 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
> >
> > 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
> sync = 2
> > 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
> queueHead: 362981
> > 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta
> currentPosition = 216288815, logWriteOrderID = 1363659954200
> > 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
> logWriteOrderID: 1363659954200
> > 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
> >
> > @Hari
> > em, 12 hours passed. The size of file channel directory has no reduce.
> >
> > Files in file channel directory:
> >
> > -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
> > -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
> > -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
> > -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
> ./file-channel/data/log-7
> > -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
> ./file-channel/data/log-6.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
> ./file-channel/data/log-7.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
> ./file-channel/data/in_use.lock
> > -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
> ./file-channel/data/log-6
> >
> >
> >
> >
> >
> > Zhiwen Sun
> >
> >
> >
> > On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
> > It is possible for the directory size to increase even if no writes are
> going in to the channel. If the channel size is non-zero and the sink is
> still writing events to HDFS, the takes get written to disk as well (so we
> know what events in the files were removed when the channel/agent
> restarts). Eventually the channel will clean up the files which have all
> events taken (though it will keep at least 2 files per data directory, just
> to be safe).
> >
> > --
> > Hari Shreedharan
> >
> > On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
> >
> >> Hey,
> >>
> >> what says debug? Do you can gather logs and attach them?
> >>
> >> - Alex
> >>
> >> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com>
> wrote:
> >>
> >>> Check the JMX counter first, to make sure you really are not sending
> new events. If not, is it your checkpoint directory or data directory that
> is increasing in size?
> >>>
> >>>
> >>> From: Zhiwen Sun <pe...@gmail.com>
> >>> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
> >>> Date: Tue, 19 Mar 2013 01:19:19 -0700
> >>> To: "user@flume.apache.org" <us...@flume.apache.org>
> >>> Subject: Why used space of flie channel buffer directory increase?
> >>>
> >>> hi all:
> >>>
> >>> I test flume-ng in my local machine. The data flow is :
> >>>
> >>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
> >>>
> >>> My configuration file is here :
> >>>
> >>>> a1.sources = r1
> >>>> a1.channels = c2
> >>>>
> >>>> a1.sources.r1.type = netcat
> >>>> a1.sources.r1.bind = 192.168.201.197
> >>>> a1.sources.r1.port = 44444
> >>>> a1.sources.r1.max-line-length = 1000000
> >>>>
> >>>> a1.sinks.k1.type = logger
> >>>>
> >>>> a1.channels.c1.type = memory
> >>>> a1.channels.c1.capacity = 10000
> >>>> a1.channels.c1.transactionCapacity = 10000
> >>>>
> >>>> a1.channels.c2.type = file
> >>>> a1.sources.r1.channels = c2
> >>>>
> >>>> a1.sources.r1.interceptors = i1
> >>>> a1.sources.r1.interceptors.i1.type = timestamp
> >>>>
> >>>> a1.sinks = k2
> >>>> a1.sinks.k2.type = hdfs
> >>>> a1.sinks.k2.channel = c2
> >>>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
> >>>> a1.sinks.k2.hdfs.writeFormat = Text
> >>>> a1.sinks.k2.hdfs.rollInterval = 10
> >>>> a1.sinks.k2.hdfs.rollSize = 10000000
> >>>> a1.sinks.k2.hdfs.rollCount = 0
> >>>>
> >>>> a1.sinks.k2.hdfs.filePrefix = app
> >>>> a1.sinks.k2.hdfs.fileType = DataStream
> >>>
> >>>
> >>>
> >>> it seems that events were collected correctly.
> >>>
> >>> But there is a problem boring me: Used space of file channel
> (~/.flume) has always increased, even there is no new event.
> >>>
> >>> Is my configuration wrong or other problem?
> >>>
> >>> thanks.
> >>>
> >>>
> >>> Best regards.
> >>>
> >>> Zhiwen Sun
> >>
> >> --
> >> Alexander Alten-Lorenz
> >> http://mapredit.blogspot.com
> >> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> >
> >
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>

Re: Why used space of flie channel buffer directory increase?

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.
HI,

I suspect tail -F and nc for filling up the directory. Whats inside of such a file which grows without a event?

My assumption:
nc is open one stream, and deliver over this stream all incoming events. Flume doesn't know that no event is coming in, since the stream never breaks up. I wondering if you could use syslog(-ng) for the event delivery?

Cheers,
 Alex



On Mar 20, 2013, at 2:30 AM, Zhiwen Sun <pe...@gmail.com> wrote:

> Thanks all for your reply.
> 
> @Kenison 
> I stop my tail -F | nc program and there is no new event file in HDFS, so I think there is no event arrive. To make sure, I will test again with enable JMX.
> 
> @Alex
> 
> The latest log is following. I can't see any exception or warning.
> 
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 3
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0, queueHead: 362981
> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216278208, logWriteOrderID = 1363659953997
> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208 logWriteOrderID: 1363659953997
> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
> 
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 2
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0, queueHead: 362981
> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216288815, logWriteOrderID = 1363659954200
> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815 logWriteOrderID: 1363659954200
> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
> 
> @Hari
> em, 12 hours passed. The size of file channel directory has no reduce.
> 
> Files in file channel directory:
> 
> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 ./file-channel/data/log-7
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12 ./file-channel/data/log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28 ./file-channel/data/log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15 ./file-channel/data/in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 ./file-channel/data/log-6
> 
> 
> 
> 
> 
> Zhiwen Sun 
> 
> 
> 
> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <hs...@cloudera.com> wrote:
> It is possible for the directory size to increase even if no writes are going in to the channel. If the channel size is non-zero and the sink is still writing events to HDFS, the takes get written to disk as well (so we know what events in the files were removed when the channel/agent restarts). Eventually the channel will clean up the files which have all events taken (though it will keep at least 2 files per data directory, just to be safe).
> 
> -- 
> Hari Shreedharan
> 
> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
> 
>> Hey,
>> 
>> what says debug? Do you can gather logs and attach them?
>> 
>> - Alex
>> 
>> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com> wrote:
>> 
>>> Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size?
>>> 
>>> 
>>> From: Zhiwen Sun <pe...@gmail.com>
>>> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
>>> Date: Tue, 19 Mar 2013 01:19:19 -0700
>>> To: "user@flume.apache.org" <us...@flume.apache.org>
>>> Subject: Why used space of flie channel buffer directory increase?
>>> 
>>> hi all:
>>> 
>>> I test flume-ng in my local machine. The data flow is :
>>> 
>>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>>> 
>>> My configuration file is here :
>>> 
>>>> a1.sources = r1
>>>> a1.channels = c2
>>>> 
>>>> a1.sources.r1.type = netcat
>>>> a1.sources.r1.bind = 192.168.201.197
>>>> a1.sources.r1.port = 44444
>>>> a1.sources.r1.max-line-length = 1000000
>>>> 
>>>> a1.sinks.k1.type = logger
>>>> 
>>>> a1.channels.c1.type = memory
>>>> a1.channels.c1.capacity = 10000
>>>> a1.channels.c1.transactionCapacity = 10000
>>>> 
>>>> a1.channels.c2.type = file
>>>> a1.sources.r1.channels = c2
>>>> 
>>>> a1.sources.r1.interceptors = i1
>>>> a1.sources.r1.interceptors.i1.type = timestamp
>>>> 
>>>> a1.sinks = k2
>>>> a1.sinks.k2.type = hdfs
>>>> a1.sinks.k2.channel = c2
>>>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>>>> a1.sinks.k2.hdfs.writeFormat = Text
>>>> a1.sinks.k2.hdfs.rollInterval = 10
>>>> a1.sinks.k2.hdfs.rollSize = 10000000
>>>> a1.sinks.k2.hdfs.rollCount = 0
>>>> 
>>>> a1.sinks.k2.hdfs.filePrefix = app
>>>> a1.sinks.k2.hdfs.fileType = DataStream
>>> 
>>> 
>>> 
>>> it seems that events were collected correctly.
>>> 
>>> But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event.
>>> 
>>> Is my configuration wrong or other problem?
>>> 
>>> thanks.
>>> 
>>> 
>>> Best regards.
>>> 
>>> Zhiwen Sun
>> 
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> 
> 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF


Re: Why used space of flie channel buffer directory increase?

Posted by "Kenison, Matt" <Ma...@disney.com>.
It is capped. You can verify this by using the stress source and a null sink. You'll see the disk usage increase to the maximum allowed and then plateau.


From: Zhiwen Sun <pe...@gmail.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Wed, 20 Mar 2013 02:20:53 -0700
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Re: Why used space of flie channel buffer directory increase?

Thanks for your reply.

I just wanna confirm whether the space of file channel has a limit.

Zhiwen Sun



On Wed, Mar 20, 2013 at 4:06 PM, Hari Shreedharan <hs...@cloudera.com>> wrote:
If you reduce the capacity the channel will be able to buffer fewer events. If you want to reduce the space used when there are only a few events remaining set the config param: "maxFileSize" to something lower(this is in bytes). I don't advice setting this to lower than a few hundred megabytes (in fact, the default value works pretty well - do you really need to save 3GB space?)- else you will end up having a huge number of small files if there are many events wait to be taken from the channel.


Hari


On Wed, Mar 20, 2013 at 12:50 AM, Zhiwen Sun <pe...@gmail.com>> wrote:
Hi Hari:

Is that means I can reduce the capacity of file channel to cut down max disk space used by file channel?


Zhiwen Sun



On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <hs...@cloudera.com>> wrote:
Hi,

Like I mentioned earlier, we will always keep 2 data files in each data directory (the ".meta" files are metadata associated to the actual data). Once a log-8 is created(when log-7 gets rotated when it hits maximum size) and all of the events in log-6 are taken, then log-6 will get deleted, but you will still will see log-7 and log-8. So what you are seeing is not unexpected.


Hari

--
Hari Shreedharan


On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:

Thanks all for your reply.

@Kenison
I stop my tail -F | nc program and there is no new event file in HDFS, so I think there is no event arrive. To make sure, I will test again with enable JMX.

@Alex

The latest log is following. I can't see any exception or warning.

13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901>
13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp>
13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 3
13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0, queueHead: 362981
13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216278208, logWriteOrderID = 1363659953997
13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208 logWriteOrderID: 1363659953997
13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902>
13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp>
13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903>
13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp>

13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 2
13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0, queueHead: 362981
13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216288815, logWriteOrderID = 1363659954200
13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815 logWriteOrderID: 1363659954200
13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp> to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904<http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904>

@Hari
em, 12 hours passed. The size of file channel directory has no reduce.

Files in file channel directory:

-rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
-rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
-rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
-rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
-rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
-rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 ./file-channel/data/log-7
-rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12 ./file-channel/data/log-6.meta
-rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28 ./file-channel/data/log-7.meta
-rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15 ./file-channel/data/in_use.lock
-rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 ./file-channel/data/log-6





Zhiwen Sun



On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <hs...@cloudera.com>> wrote:
It is possible for the directory size to increase even if no writes are going in to the channel. If the channel size is non-zero and the sink is still writing events to HDFS, the takes get written to disk as well (so we know what events in the files were removed when the channel/agent restarts). Eventually the channel will clean up the files which have all events taken (though it will keep at least 2 files per data directory, just to be safe).

--
Hari Shreedharan


On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:

Hey,

what says debug? Do you can gather logs and attach them?

- Alex

On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com>> wrote:

Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size?


From: Zhiwen Sun <pe...@gmail.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Tue, 19 Mar 2013 01:19:19 -0700
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Why used space of flie channel buffer directory increase?

hi all:

I test flume-ng in my local machine. The data flow is :

tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs

My configuration file is here :

a1.sources = r1
a1.channels = c2

a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.201.197
a1.sources.r1.port = 44444
a1.sources.r1.max-line-length = 1000000

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000

a1.channels.c2.type = file
a1.sources.r1.channels = c2

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

a1.sinks = k2
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d<http://127.0.0.1:9000/flume/events/%Y-%m-%d>
a1.sinks.k2.hdfs.writeFormat = Text
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 10000000
a1.sinks.k2.hdfs.rollCount = 0

a1.sinks.k2.hdfs.filePrefix = app
a1.sinks.k2.hdfs.fileType = DataStream



it seems that events were collected correctly.

But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event.

Is my configuration wrong or other problem?

thanks.


Best regards.

Zhiwen Sun

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF







Re: Why used space of flie channel buffer directory increase?

Posted by Zhiwen Sun <pe...@gmail.com>.
Thanks for your reply.

I just wanna confirm whether the space of file channel has a limit.

Zhiwen Sun



On Wed, Mar 20, 2013 at 4:06 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> If you reduce the capacity the channel will be able to buffer fewer
> events. If you want to reduce the space used when there are only a few
> events remaining set the config param: "maxFileSize" to something
> lower(this is in bytes). I don't advice setting this to lower than a few
> hundred megabytes (in fact, the default value works pretty well - do you
> really need to save 3GB space?)- else you will end up having a huge number
> of small files if there are many events wait to be taken from the channel.
>
>
> Hari
>
>
> On Wed, Mar 20, 2013 at 12:50 AM, Zhiwen Sun <pe...@gmail.com> wrote:
>
>> Hi Hari:
>>
>> Is that means I can reduce the capacity of file channel to cut down max
>> disk space used by file channel?
>>
>>
>> Zhiwen Sun
>>
>>
>>
>> On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>>  Hi,
>>>
>>> Like I mentioned earlier, we will always keep 2 data files in each data
>>> directory (the ".meta" files are metadata associated to the actual data).
>>> Once a log-8 is created(when log-7 gets rotated when it hits maximum size)
>>> and all of the events in log-6 are taken, then log-6 will get deleted, but
>>> you will still will see log-7 and log-8. So what you are seeing is not
>>> unexpected.
>>>
>>>
>>> Hari
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:
>>>
>>> Thanks all for your reply.
>>>
>>> @Kenison
>>> I stop my tail -F | nc program and there is no new event file in HDFS,
>>> so I think there is no event arrive. To make sure, I will test again with
>>> enable JMX.
>>>
>>> @Alex
>>>
>>> The latest log is following. I can't see any exception or warning.
>>>
>>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
>>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
>>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
>>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>>> sync = 3
>>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
>>> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
>>> queueHead: 362981
>>> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta
>>> currentPosition = 216278208, logWriteOrderID = 1363659953997
>>> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
>>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
>>> logWriteOrderID: 1363659953997
>>> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
>>> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
>>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
>>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
>>>
>>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
>>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>>> sync = 2
>>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
>>> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
>>> queueHead: 362981
>>> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta
>>> currentPosition = 216288815, logWriteOrderID = 1363659954200
>>> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
>>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
>>> logWriteOrderID: 1363659954200
>>> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
>>>
>>>
>>> @Hari
>>> em, 12 hours passed. The size of file channel directory has no reduce.
>>>
>>> Files in file channel directory:
>>>
>>> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
>>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
>>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
>>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
>>> ./file-channel/data/log-7
>>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
>>> ./file-channel/data/log-6.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
>>> ./file-channel/data/log-7.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
>>> ./file-channel/data/in_use.lock
>>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
>>> ./file-channel/data/log-6
>>>
>>>
>>>
>>>
>>>
>>>
>>> Zhiwen Sun
>>>
>>>
>>>
>>> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <
>>> hshreedharan@cloudera.com> wrote:
>>>
>>>  It is possible for the directory size to increase even if no writes are
>>> going in to the channel. If the channel size is non-zero and the sink is
>>> still writing events to HDFS, the takes get written to disk as well (so we
>>> know what events in the files were removed when the channel/agent
>>> restarts). Eventually the channel will clean up the files which have all
>>> events taken (though it will keep at least 2 files per data directory, just
>>> to be safe).
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
>>>
>>> Hey,
>>>
>>> what says debug? Do you can gather logs and attach them?
>>>
>>> - Alex
>>>
>>> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com>
>>> wrote:
>>>
>>> Check the JMX counter first, to make sure you really are not sending new
>>> events. If not, is it your checkpoint directory or data directory that is
>>> increasing in size?
>>>
>>>
>>> From: Zhiwen Sun <pe...@gmail.com>
>>> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
>>> Date: Tue, 19 Mar 2013 01:19:19 -0700
>>> To: "user@flume.apache.org" <us...@flume.apache.org>
>>> Subject: Why used space of flie channel buffer directory increase?
>>>
>>> hi all:
>>>
>>> I test flume-ng in my local machine. The data flow is :
>>>
>>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>>>
>>> My configuration file is here :
>>>
>>> a1.sources = r1
>>> a1.channels = c2
>>>
>>> a1.sources.r1.type = netcat
>>> a1.sources.r1.bind = 192.168.201.197
>>> a1.sources.r1.port = 44444
>>> a1.sources.r1.max-line-length = 1000000
>>>
>>> a1.sinks.k1.type = logger
>>>
>>> a1.channels.c1.type = memory
>>> a1.channels.c1.capacity = 10000
>>> a1.channels.c1.transactionCapacity = 10000
>>>
>>> a1.channels.c2.type = file
>>> a1.sources.r1.channels = c2
>>>
>>> a1.sources.r1.interceptors = i1
>>> a1.sources.r1.interceptors.i1.type = timestamp
>>>
>>> a1.sinks = k2
>>> a1.sinks.k2.type = hdfs
>>> a1.sinks.k2.channel = c2
>>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>>> a1.sinks.k2.hdfs.writeFormat = Text
>>> a1.sinks.k2.hdfs.rollInterval = 10
>>> a1.sinks.k2.hdfs.rollSize = 10000000
>>> a1.sinks.k2.hdfs.rollCount = 0
>>>
>>> a1.sinks.k2.hdfs.filePrefix = app
>>> a1.sinks.k2.hdfs.fileType = DataStream
>>>
>>>
>>>
>>>
>>> it seems that events were collected correctly.
>>>
>>> But there is a problem boring me: Used space of file channel (~/.flume)
>>> has always increased, even there is no new event.
>>>
>>> Is my configuration wrong or other problem?
>>>
>>> thanks.
>>>
>>>
>>> Best regards.
>>>
>>> Zhiwen Sun
>>>
>>>
>>> --
>>> Alexander Alten-Lorenz
>>> http://mapredit.blogspot.com
>>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Why used space of flie channel buffer directory increase?

Posted by Hari Shreedharan <hs...@cloudera.com>.
If you reduce the capacity the channel will be able to buffer fewer events.
If you want to reduce the space used when there are only a few events
remaining set the config param: "maxFileSize" to something lower(this is in
bytes). I don't advice setting this to lower than a few hundred megabytes
(in fact, the default value works pretty well - do you really need to save
3GB space?)- else you will end up having a huge number of small files if
there are many events wait to be taken from the channel.


Hari


On Wed, Mar 20, 2013 at 12:50 AM, Zhiwen Sun <pe...@gmail.com> wrote:

> Hi Hari:
>
> Is that means I can reduce the capacity of file channel to cut down max
> disk space used by file channel?
>
>
> Zhiwen Sun
>
>
>
> On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  Hi,
>>
>> Like I mentioned earlier, we will always keep 2 data files in each data
>> directory (the ".meta" files are metadata associated to the actual data).
>> Once a log-8 is created(when log-7 gets rotated when it hits maximum size)
>> and all of the events in log-6 are taken, then log-6 will get deleted, but
>> you will still will see log-7 and log-8. So what you are seeing is not
>> unexpected.
>>
>>
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:
>>
>> Thanks all for your reply.
>>
>> @Kenison
>> I stop my tail -F | nc program and there is no new event file in HDFS, so
>> I think there is no event arrive. To make sure, I will test again with
>> enable JMX.
>>
>> @Alex
>>
>> The latest log is following. I can't see any exception or warning.
>>
>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>> sync = 3
>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
>> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
>> queueHead: 362981
>> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta
>> currentPosition = 216278208, logWriteOrderID = 1363659953997
>> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
>> logWriteOrderID: 1363659953997
>> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
>> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
>>
>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>> sync = 2
>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
>> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
>> queueHead: 362981
>> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta
>> currentPosition = 216288815, logWriteOrderID = 1363659954200
>> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
>> logWriteOrderID: 1363659954200
>> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
>>
>>
>> @Hari
>> em, 12 hours passed. The size of file channel directory has no reduce.
>>
>> Files in file channel directory:
>>
>> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
>> ./file-channel/data/log-7
>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
>> ./file-channel/data/log-6.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
>> ./file-channel/data/log-7.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
>> ./file-channel/data/in_use.lock
>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
>> ./file-channel/data/log-6
>>
>>
>>
>>
>>
>>
>> Zhiwen Sun
>>
>>
>>
>> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>  It is possible for the directory size to increase even if no writes are
>> going in to the channel. If the channel size is non-zero and the sink is
>> still writing events to HDFS, the takes get written to disk as well (so we
>> know what events in the files were removed when the channel/agent
>> restarts). Eventually the channel will clean up the files which have all
>> events taken (though it will keep at least 2 files per data directory, just
>> to be safe).
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
>>
>> Hey,
>>
>> what says debug? Do you can gather logs and attach them?
>>
>> - Alex
>>
>> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com>
>> wrote:
>>
>> Check the JMX counter first, to make sure you really are not sending new
>> events. If not, is it your checkpoint directory or data directory that is
>> increasing in size?
>>
>>
>> From: Zhiwen Sun <pe...@gmail.com>
>> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
>> Date: Tue, 19 Mar 2013 01:19:19 -0700
>> To: "user@flume.apache.org" <us...@flume.apache.org>
>> Subject: Why used space of flie channel buffer directory increase?
>>
>> hi all:
>>
>> I test flume-ng in my local machine. The data flow is :
>>
>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>>
>> My configuration file is here :
>>
>> a1.sources = r1
>> a1.channels = c2
>>
>> a1.sources.r1.type = netcat
>> a1.sources.r1.bind = 192.168.201.197
>> a1.sources.r1.port = 44444
>> a1.sources.r1.max-line-length = 1000000
>>
>> a1.sinks.k1.type = logger
>>
>> a1.channels.c1.type = memory
>> a1.channels.c1.capacity = 10000
>> a1.channels.c1.transactionCapacity = 10000
>>
>> a1.channels.c2.type = file
>> a1.sources.r1.channels = c2
>>
>> a1.sources.r1.interceptors = i1
>> a1.sources.r1.interceptors.i1.type = timestamp
>>
>> a1.sinks = k2
>> a1.sinks.k2.type = hdfs
>> a1.sinks.k2.channel = c2
>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>> a1.sinks.k2.hdfs.writeFormat = Text
>> a1.sinks.k2.hdfs.rollInterval = 10
>> a1.sinks.k2.hdfs.rollSize = 10000000
>> a1.sinks.k2.hdfs.rollCount = 0
>>
>> a1.sinks.k2.hdfs.filePrefix = app
>> a1.sinks.k2.hdfs.fileType = DataStream
>>
>>
>>
>>
>> it seems that events were collected correctly.
>>
>> But there is a problem boring me: Used space of file channel (~/.flume)
>> has always increased, even there is no new event.
>>
>> Is my configuration wrong or other problem?
>>
>> thanks.
>>
>>
>> Best regards.
>>
>> Zhiwen Sun
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>>
>>
>>
>

Re: Why used space of flie channel buffer directory increase?

Posted by Zhiwen Sun <pe...@gmail.com>.
Hi Hari:

Is that means I can reduce the capacity of file channel to cut down max
disk space used by file channel?


Zhiwen Sun



On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

>  Hi,
>
> Like I mentioned earlier, we will always keep 2 data files in each data
> directory (the ".meta" files are metadata associated to the actual data).
> Once a log-8 is created(when log-7 gets rotated when it hits maximum size)
> and all of the events in log-6 are taken, then log-6 will get deleted, but
> you will still will see log-7 and log-8. So what you are seeing is not
> unexpected.
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:
>
> Thanks all for your reply.
>
> @Kenison
> I stop my tail -F | nc program and there is no new event file in HDFS, so
> I think there is no event arrive. To make sure, I will test again with
> enable JMX.
>
> @Alex
>
> The latest log is following. I can't see any exception or warning.
>
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
> sync = 3
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
> queueHead: 362981
> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition
> = 216278208, logWriteOrderID = 1363659953997
> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
> logWriteOrderID: 1363659953997
> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
>
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
> sync = 2
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
> queueHead: 362981
> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition
> = 216288815, logWriteOrderID = 1363659954200
> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
> logWriteOrderID: 1363659954200
> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
>
>
> @Hari
> em, 12 hours passed. The size of file channel directory has no reduce.
>
> Files in file channel directory:
>
> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
> ./file-channel/data/log-7
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
> ./file-channel/data/log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
> ./file-channel/data/log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
> ./file-channel/data/in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
> ./file-channel/data/log-6
>
>
>
>
>
>
> Zhiwen Sun
>
>
>
> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>  It is possible for the directory size to increase even if no writes are
> going in to the channel. If the channel size is non-zero and the sink is
> still writing events to HDFS, the takes get written to disk as well (so we
> know what events in the files were removed when the channel/agent
> restarts). Eventually the channel will clean up the files which have all
> events taken (though it will keep at least 2 files per data directory, just
> to be safe).
>
> --
> Hari Shreedharan
>
> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
>
> Hey,
>
> what says debug? Do you can gather logs and attach them?
>
> - Alex
>
> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com>
> wrote:
>
> Check the JMX counter first, to make sure you really are not sending new
> events. If not, is it your checkpoint directory or data directory that is
> increasing in size?
>
>
> From: Zhiwen Sun <pe...@gmail.com>
> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
> Date: Tue, 19 Mar 2013 01:19:19 -0700
> To: "user@flume.apache.org" <us...@flume.apache.org>
> Subject: Why used space of flie channel buffer directory increase?
>
> hi all:
>
> I test flume-ng in my local machine. The data flow is :
>
> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>
> My configuration file is here :
>
> a1.sources = r1
> a1.channels = c2
>
> a1.sources.r1.type = netcat
> a1.sources.r1.bind = 192.168.201.197
> a1.sources.r1.port = 44444
> a1.sources.r1.max-line-length = 1000000
>
> a1.sinks.k1.type = logger
>
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 10000
> a1.channels.c1.transactionCapacity = 10000
>
> a1.channels.c2.type = file
> a1.sources.r1.channels = c2
>
> a1.sources.r1.interceptors = i1
> a1.sources.r1.interceptors.i1.type = timestamp
>
> a1.sinks = k2
> a1.sinks.k2.type = hdfs
> a1.sinks.k2.channel = c2
> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
> a1.sinks.k2.hdfs.writeFormat = Text
> a1.sinks.k2.hdfs.rollInterval = 10
> a1.sinks.k2.hdfs.rollSize = 10000000
> a1.sinks.k2.hdfs.rollCount = 0
>
> a1.sinks.k2.hdfs.filePrefix = app
> a1.sinks.k2.hdfs.fileType = DataStream
>
>
>
>
> it seems that events were collected correctly.
>
> But there is a problem boring me: Used space of file channel (~/.flume)
> has always increased, even there is no new event.
>
> Is my configuration wrong or other problem?
>
> thanks.
>
>
> Best regards.
>
> Zhiwen Sun
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>
>
>
>

Re: Why used space of flie channel buffer directory increase?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Hi, 

Like I mentioned earlier, we will always keep 2 data files in each data directory (the ".meta" files are metadata associated to the actual data). Once a log-8 is created(when log-7 gets rotated when it hits maximum size) and all of the events in log-6 are taken, then log-6 will get deleted, but you will still will see log-7 and log-8. So what you are seeing is not unexpected.


Hari 

-- 
Hari Shreedharan


On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:

> Thanks all for your reply.
> 
> @Kenison 
> I stop my tail -F | nc program and there is no new event file in HDFS, so I think there is no event arrive. To make sure, I will test again with enable JMX.
> 
> @Alex
> 
> The latest log is following. I can't see any exception or warning.
> 
> > 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp) to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901 (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901)
> > 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp)
> > 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 3
> > 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0, queueHead: 362981
> > 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216278208, logWriteOrderID = 1363659953997
> > 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208 logWriteOrderID: 1363659953997
> > 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp) to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902 (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902)
> > 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp)
> > 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp) to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903 (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903)
> > 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp)
> > 
> > 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to sync = 2
> > 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0, queueHead: 362981
> > 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216288815, logWriteOrderID = 1363659954200
> > 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815 logWriteOrderID: 1363659954200
> > 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp) to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904 (http://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904)
> 
> @Hari
> em, 12 hours passed. The size of file channel directory has no reduce.
> 
> Files in file channel directory:
> 
> > -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
> > -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
> > -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
> > -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 ./file-channel/data/log-7
> > -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12 ./file-channel/data/log-6.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28 ./file-channel/data/log-7.meta
> > -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15 ./file-channel/data/in_use.lock
> > -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 ./file-channel/data/log-6
> 
> 
> 
> 
> 
> Zhiwen Sun 
> 
> 
> 
> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> > It is possible for the directory size to increase even if no writes are going in to the channel. If the channel size is non-zero and the sink is still writing events to HDFS, the takes get written to disk as well (so we know what events in the files were removed when the channel/agent restarts). Eventually the channel will clean up the files which have all events taken (though it will keep at least 2 files per data directory, just to be safe). 
> > 
> > -- 
> > Hari Shreedharan
> > 
> > 
> > On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
> > 
> > > Hey,
> > > 
> > > what says debug? Do you can gather logs and attach them?
> > > 
> > > - Alex
> > > 
> > > On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Matt.Kenison@disney.com (mailto:Matt.Kenison@disney.com)> wrote: 
> > > 
> > > > Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size? 
> > > > 
> > > > 
> > > > From: Zhiwen Sun <pensz01@gmail.com (mailto:pensz01@gmail.com)>
> > > > Reply-To: "user@flume.apache.org (mailto:user@flume.apache.org)" <user@flume.apache.org (mailto:user@flume.apache.org)>
> > > > Date: Tue, 19 Mar 2013 01:19:19 -0700
> > > > To: "user@flume.apache.org (mailto:user@flume.apache.org)" <user@flume.apache.org (mailto:user@flume.apache.org)>
> > > > Subject: Why used space of flie channel buffer directory increase?
> > > > 
> > > > hi all:
> > > > 
> > > > I test flume-ng in my local machine. The data flow is :
> > > > 
> > > > tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs 
> > > > 
> > > > My configuration file is here :
> > > > 
> > > > > a1.sources = r1
> > > > > a1.channels = c2
> > > > > 
> > > > > a1.sources.r1.type = netcat
> > > > > a1.sources.r1.bind = 192.168.201.197
> > > > > a1.sources.r1.port = 44444
> > > > > a1.sources.r1.max-line-length = 1000000
> > > > > 
> > > > > a1.sinks.k1.type = logger
> > > > > 
> > > > > a1.channels.c1.type = memory
> > > > > a1.channels.c1.capacity = 10000
> > > > > a1.channels.c1.transactionCapacity = 10000
> > > > > 
> > > > > a1.channels.c2.type = file
> > > > > a1.sources.r1.channels = c2
> > > > > 
> > > > > a1.sources.r1.interceptors = i1
> > > > > a1.sources.r1.interceptors.i1.type = timestamp
> > > > > 
> > > > > a1.sinks = k2
> > > > > a1.sinks.k2.type = hdfs
> > > > > a1.sinks.k2.channel = c2 
> > > > > a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d (http://127.0.0.1:9000/flume/events/%Y-%m-%d)
> > > > > a1.sinks.k2.hdfs.writeFormat = Text
> > > > > a1.sinks.k2.hdfs.rollInterval = 10
> > > > > a1.sinks.k2.hdfs.rollSize = 10000000
> > > > > a1.sinks.k2.hdfs.rollCount = 0
> > > > > 
> > > > > a1.sinks.k2.hdfs.filePrefix = app 
> > > > > a1.sinks.k2.hdfs.fileType = DataStream
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > it seems that events were collected correctly.
> > > > 
> > > > But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event. 
> > > > 
> > > > Is my configuration wrong or other problem? 
> > > > 
> > > > thanks.
> > > > 
> > > > 
> > > > Best regards.
> > > > 
> > > > Zhiwen Sun 
> > > 
> > > --
> > > Alexander Alten-Lorenz
> > > http://mapredit.blogspot.com
> > > German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> > > 
> > > 
> > > 
> > 
> > 
> 


Re: Why used space of flie channel buffer directory increase?

Posted by Zhiwen Sun <pe...@gmail.com>.
Thanks all for your reply.

@Kenison
I stop my tail -F | nc program and there is no new event file in HDFS, so I
think there is no event arrive. To make sure, I will test again with enable
JMX.

@Alex

The latest log is following. I can't see any exception or warning.

13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
> sync = 3
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
> queueHead: 362981
> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition
> = 216278208, logWriteOrderID = 1363659953997
> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
> logWriteOrderID: 1363659953997
> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
>
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
> sync = 2
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
> queueHead: 362981
> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition
> = 216288815, logWriteOrderID = 1363659954200
> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
> logWriteOrderID: 1363659954200
> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
>

@Hari
em, 12 hours passed. The size of file channel directory has no reduce.

Files in file channel directory:

-rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
> ./file-channel/data/log-7
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
> ./file-channel/data/log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
> ./file-channel/data/log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
> ./file-channel/data/in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
> ./file-channel/data/log-6
>





Zhiwen Sun



On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

>  It is possible for the directory size to increase even if no writes are
> going in to the channel. If the channel size is non-zero and the sink is
> still writing events to HDFS, the takes get written to disk as well (so we
> know what events in the files were removed when the channel/agent
> restarts). Eventually the channel will clean up the files which have all
> events taken (though it will keep at least 2 files per data directory, just
> to be safe).
>
> --
> Hari Shreedharan
>
> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
>
> Hey,
>
> what says debug? Do you can gather logs and attach them?
>
> - Alex
>
> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com>
> wrote:
>
> Check the JMX counter first, to make sure you really are not sending new
> events. If not, is it your checkpoint directory or data directory that is
> increasing in size?
>
>
> From: Zhiwen Sun <pe...@gmail.com>
> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
> Date: Tue, 19 Mar 2013 01:19:19 -0700
> To: "user@flume.apache.org" <us...@flume.apache.org>
> Subject: Why used space of flie channel buffer directory increase?
>
> hi all:
>
> I test flume-ng in my local machine. The data flow is :
>
> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>
> My configuration file is here :
>
> a1.sources = r1
> a1.channels = c2
>
> a1.sources.r1.type = netcat
> a1.sources.r1.bind = 192.168.201.197
> a1.sources.r1.port = 44444
> a1.sources.r1.max-line-length = 1000000
>
> a1.sinks.k1.type = logger
>
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 10000
> a1.channels.c1.transactionCapacity = 10000
>
> a1.channels.c2.type = file
> a1.sources.r1.channels = c2
>
> a1.sources.r1.interceptors = i1
> a1.sources.r1.interceptors.i1.type = timestamp
>
> a1.sinks = k2
> a1.sinks.k2.type = hdfs
> a1.sinks.k2.channel = c2
> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
> a1.sinks.k2.hdfs.writeFormat = Text
> a1.sinks.k2.hdfs.rollInterval = 10
> a1.sinks.k2.hdfs.rollSize = 10000000
> a1.sinks.k2.hdfs.rollCount = 0
>
> a1.sinks.k2.hdfs.filePrefix = app
> a1.sinks.k2.hdfs.fileType = DataStream
>
>
>
>
> it seems that events were collected correctly.
>
> But there is a problem boring me: Used space of file channel (~/.flume)
> has always increased, even there is no new event.
>
> Is my configuration wrong or other problem?
>
> thanks.
>
>
> Best regards.
>
> Zhiwen Sun
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>
>

Re: Why used space of flie channel buffer directory increase?

Posted by Hari Shreedharan <hs...@cloudera.com>.
It is possible for the directory size to increase even if no writes are going in to the channel. If the channel size is non-zero and the sink is still writing events to HDFS, the takes get written to disk as well (so we know what events in the files were removed when the channel/agent restarts). Eventually the channel will clean up the files which have all events taken (though it will keep at least 2 files per data directory, just to be safe). 

-- 
Hari Shreedharan


On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:

> Hey,
> 
> what says debug? Do you can gather logs and attach them?
> 
> - Alex
> 
> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Matt.Kenison@disney.com (mailto:Matt.Kenison@disney.com)> wrote:
> 
> > Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size?
> > 
> > 
> > From: Zhiwen Sun <pensz01@gmail.com (mailto:pensz01@gmail.com)>
> > Reply-To: "user@flume.apache.org (mailto:user@flume.apache.org)" <user@flume.apache.org (mailto:user@flume.apache.org)>
> > Date: Tue, 19 Mar 2013 01:19:19 -0700
> > To: "user@flume.apache.org (mailto:user@flume.apache.org)" <user@flume.apache.org (mailto:user@flume.apache.org)>
> > Subject: Why used space of flie channel buffer directory increase?
> > 
> > hi all:
> > 
> > I test flume-ng in my local machine. The data flow is :
> > 
> > tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
> > 
> > My configuration file is here :
> > 
> > > a1.sources = r1
> > > a1.channels = c2
> > > 
> > > a1.sources.r1.type = netcat
> > > a1.sources.r1.bind = 192.168.201.197
> > > a1.sources.r1.port = 44444
> > > a1.sources.r1.max-line-length = 1000000
> > > 
> > > a1.sinks.k1.type = logger
> > > 
> > > a1.channels.c1.type = memory
> > > a1.channels.c1.capacity = 10000
> > > a1.channels.c1.transactionCapacity = 10000
> > > 
> > > a1.channels.c2.type = file
> > > a1.sources.r1.channels = c2
> > > 
> > > a1.sources.r1.interceptors = i1
> > > a1.sources.r1.interceptors.i1.type = timestamp
> > > 
> > > a1.sinks = k2
> > > a1.sinks.k2.type = hdfs
> > > a1.sinks.k2.channel = c2 
> > > a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
> > > a1.sinks.k2.hdfs.writeFormat = Text
> > > a1.sinks.k2.hdfs.rollInterval = 10
> > > a1.sinks.k2.hdfs.rollSize = 10000000
> > > a1.sinks.k2.hdfs.rollCount = 0
> > > 
> > > a1.sinks.k2.hdfs.filePrefix = app 
> > > a1.sinks.k2.hdfs.fileType = DataStream
> > > 
> > 
> > 
> > 
> > 
> > it seems that events were collected correctly.
> > 
> > But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event.
> > 
> > Is my configuration wrong or other problem? 
> > 
> > thanks.
> > 
> > 
> > Best regards.
> > 
> > Zhiwen Sun 
> 
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> 
> 



Re: Why used space of flie channel buffer directory increase?

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.
Hey,

what says debug? Do you can gather logs and attach them?

- Alex

On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Ma...@disney.com> wrote:

> Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size?
> 
> 
> From: Zhiwen Sun <pe...@gmail.com>
> Reply-To: "user@flume.apache.org" <us...@flume.apache.org>
> Date: Tue, 19 Mar 2013 01:19:19 -0700
> To: "user@flume.apache.org" <us...@flume.apache.org>
> Subject: Why used space of flie channel buffer directory increase?
> 
> hi all:
> 
> I test flume-ng in my local machine. The data flow is :
> 
>   tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
> 
> My configuration file is here :
> 
>> a1.sources = r1
>> a1.channels = c2
>> 
>> a1.sources.r1.type = netcat
>> a1.sources.r1.bind = 192.168.201.197
>> a1.sources.r1.port = 44444
>> a1.sources.r1.max-line-length = 1000000
>> 
>> a1.sinks.k1.type = logger
>> 
>> a1.channels.c1.type = memory
>> a1.channels.c1.capacity = 10000
>> a1.channels.c1.transactionCapacity = 10000
>> 
>> a1.channels.c2.type = file
>> a1.sources.r1.channels = c2
>> 
>> a1.sources.r1.interceptors = i1
>> a1.sources.r1.interceptors.i1.type = timestamp
>> 
>> a1.sinks = k2
>> a1.sinks.k2.type = hdfs
>> a1.sinks.k2.channel = c2  
>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>> a1.sinks.k2.hdfs.writeFormat = Text
>> a1.sinks.k2.hdfs.rollInterval = 10
>> a1.sinks.k2.hdfs.rollSize = 10000000
>> a1.sinks.k2.hdfs.rollCount = 0
>> 
>> a1.sinks.k2.hdfs.filePrefix = app 
>> a1.sinks.k2.hdfs.fileType = DataStream
> 
> 
> 
> it seems that events were collected correctly.
> 
> But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event.
> 
> Is my configuration wrong or other problem? 
> 
> thanks.
> 
> 
> Best regards.
> 
> Zhiwen Sun 
> 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF


Re: Why used space of flie channel buffer directory increase?

Posted by "Kenison, Matt" <Ma...@disney.com>.
Check the JMX counter first, to make sure you really are not sending new events. If not, is it your checkpoint directory or data directory that is increasing in size?


From: Zhiwen Sun <pe...@gmail.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Tue, 19 Mar 2013 01:19:19 -0700
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Why used space of flie channel buffer directory increase?

hi all:

I test flume-ng in my local machine. The data flow is :

  tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs

My configuration file is here :

a1.sources = r1
a1.channels = c2

a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.201.197
a1.sources.r1.port = 44444
a1.sources.r1.max-line-length = 1000000

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000

a1.channels.c2.type = file
a1.sources.r1.channels = c2

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

a1.sinks = k2
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d<http://127.0.0.1:9000/flume/events/%Y-%m-%d>
a1.sinks.k2.hdfs.writeFormat = Text
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 10000000
a1.sinks.k2.hdfs.rollCount = 0

a1.sinks.k2.hdfs.filePrefix = app
a1.sinks.k2.hdfs.fileType = DataStream



it seems that events were collected correctly.

But there is a problem boring me: Used space of file channel (~/.flume) has always increased, even there is no new event.

Is my configuration wrong or other problem?

thanks.


Best regards.

Zhiwen Sun