You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Umesh Telang <Um...@bbc.co.uk> on 2014/01/29 18:02:10 UTC

checkpoint lifecycle

Hello,

Under a file channels checkpoint directory, I see the following files:
checkpoint
checkpoint.meta

I wanted to know whether the size of the checkpoint file should reach a steady state if the amount and rate of input to the file chain remains the same.

My understanding is that the checkpoint file is associate with the write ahead log. Is this something that continues to grow indefinitely?

Or is there some lifecycle management that cleans out very old entries from the write ahead log?

If not, is there some strategy that we should employ to manage the size of the checkpoint file (in our case, it's currently over 1GB after 2 days' operation).

Thanks for any advice on this.

Kind regards,
Umesh




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

RE: checkpoint lifecycle

Posted by Umesh Telang <Um...@bbc.co.uk>.

Thanks very much, Brock for all your help.

________________________________
From: Brock Noland [brock@cloudera.com]
Sent: 30 January 2014 16:28
To: user@flume.apache.org
Subject: Re: checkpoint lifecycle

On Thu, Jan 30, 2014 at 9:29 AM, Umesh Telang <Um...@bbc.co.uk>> wrote:

Ah, ok. So 32 bytes is required for each pointer to an event.

Yep :)

We'll amend our heap size accordingly. We may also be able to reduce our FileChannel size. We hadn't understood the implications of the capacity value of the FileChannel we have been using.

Regarding the multiple data directories, I hadn't realised that that implied distinct disks. Just to confirm, you're saying that each data directory has to be on a distinct disk?

The recommendation is that you have two data directories per distinct disk.

Is it that FileChannel can't utilise an entire disk from an IO perspective, regardless of how big the disk is?

Right, it has nothing to do with size and everything todo with IO bandwidth. We could optimize this area (and will) but for now specifying two data directories per disk is a good workaround.

Or is this size-dependent? i.e above a certain size, you need a second data directory? If the latter, could you let me know what that size is?
If it's a general point, then I'll follow the earlier advice of 2 data dirs per file channel.

Doesn't relate to size.

Apologies for all the questions!

We had made an estimation of disk space (avg event size (~250 bytes)  * channel size (150M)) and have provisioned disks that are significantly larger than the required space.

Perfect, great to hear!

Thanks,
Umesh

________________________________
From: Brock Noland [brock@cloudera.com<ma...@cloudera.com>]
Sent: 30 January 2014 14:38

To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: checkpoint lifecycle

On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang <Um...@bbc.co.uk>> wrote:

Hi Brock,

Our heap size is 2GB.

That is not enough heap for 150M events. It's 150 million * 32 bytes = 4.5GB + say 100-500MB for the rest of Flume.

Thanks for the advice on data directories. Could you please let me know the heuristic for that?   (e.g. 1 data directory per N-sized channel where N is...)

File channel at present cannot utilize an entire disk from a IO perspective, that is why I suggest multiple disks. Of course you'll want to ensure that you have enough disk to support a full channel, but that is a different discussion (avg event size * channel size).

Thanks also for suggesting back up checkpoints - are these something that increases the integrity of Flume's execution in an automatic fashion, or does it aid in some form of manual recovery?

Automatic. If flume is killed or shutdown during a checkpoint that checkpoint is invalid and unless a backup checkpoint exists a full replay will have to take place. Furthermore, without FLUME-2155 full replays are very time consuming under certain conditions.

Re: FLUME-2155, I've scanned through it, and will read it in more detail. I'm not sure about the unit of measurement for some of the metrics (milliseconds?), but is there any guidance as to at which order of magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay issue to become apparent?

It's not purely about channel size. Specifically it's about:

1) Large channel size
2) Having a large number of events in your channel (queue depth)
3) Having run the channel for some time such that old WAL's were cleaned up (causing there to be removes for which no event exists)
4) Performing a full replay in these conditions

Generally I wouldn't go over a 1M channel size without backup checkpoint, this change, or both. There are more details here:

https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465

Brock

----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

--
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

Re: checkpoint lifecycle

Posted by Brock Noland <br...@cloudera.com>.

On Thu, Jan 30, 2014 at 9:29 AM, Umesh Telang <Um...@bbc.co.uk>wrote:

>  Ah, ok. So 32 bytes is required for each pointer to an event.
>

Yep :)


> We'll amend our heap size accordingly. We may also be able to reduce our
> FileChannel size. We hadn't understood the implications of the capacity
> value of the FileChannel we have been using.
>
>  Regarding the multiple data directories, I hadn't realised that that
> implied distinct disks. Just to confirm, you're saying that each data
> directory has to be on a distinct disk?
>

The recommendation is that you have two data directories per distinct disk.


> Is it that FileChannel can't utilise an entire disk from an IO
> perspective, regardless of how big the disk is?
>

Right, it has nothing to do with size and everything todo with IO
bandwidth. We could optimize this area (and will) but for now specifying
two data directories per disk is a good workaround.


> Or is this size-dependent? i.e above a certain size, you need a second
> data directory? If the latter, could you let me know what that size is?
> If it's a general point, then I'll follow the earlier advice of 2 data
> dirs per file channel.
>

Doesn't relate to size.


>
>  Apologies for all the questions!
>
>  We had made an estimation of disk space (avg event size (~250 bytes)  *
> channel size (150M)) and have provisioned disks that are significantly
> larger than the required space.
>

Perfect, great to hear!

>
>  Thanks,
> Umesh
>
>  ------------------------------
> *From:* Brock Noland [brock@cloudera.com]
> *Sent:* 30 January 2014 14:38
>
> *To:* user@flume.apache.org
> *Subject:* Re: checkpoint lifecycle
>
>    On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang <Um...@bbc.co.uk>wrote:
>
>>  Hi Brock,
>>
>>  Our heap size is 2GB.
>>
>
>  That is not enough heap for 150M events. It's 150 million * 32 bytes =
> 4.5GB + say 100-500MB for the rest of Flume.
>
>
>>
>>  Thanks for the advice on data directories. Could you please let me know
>> the heuristic for that?   (e.g. 1 data directory per N-sized channel where
>> N is...)
>>
>
>  File channel at present cannot utilize an entire disk from a IO
> perspective, that is why I suggest multiple disks. Of course you'll want to
> ensure that you have enough disk to support a full channel, but that is a
> different discussion (avg event size * channel size).
>
>
>>
>>  Thanks also for suggesting back up checkpoints - are these something
>> that increases the integrity of Flume's execution in an automatic fashion,
>> or does it aid in some form of manual recovery?
>>
>
>  Automatic. If flume is killed or shutdown during a checkpoint that
> checkpoint is invalid and unless a backup checkpoint exists a full replay
> will have to take place. Furthermore, without FLUME-2155 full replays are
> very time consuming under certain conditions.
>
>
>>
>>  Re: FLUME-2155, I've scanned through it, and will read it in more
>> detail. I'm not sure about the unit of measurement for some of the metrics
>> (milliseconds?), but is there any guidance as to at which order of
>> magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay issue
>> to become apparent?
>>
>
>  It's not purely about channel size. Specifically it's about:
>
>  1) Large channel size
> 2) Having a large number of events in your channel (queue depth)
> 3) Having run the channel for some time such that old WAL's were cleaned
> up (causing there to be removes for which no event exists)
> 4) Performing a full replay in these conditions
>
>  Generally I wouldn't go over a 1M channel size without backup
> checkpoint, this change, or both. There are more details here:
>
>
> https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465
>
>  Brock
>
>
>
> ----------------------------
>
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
> ---------------------
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

RE: checkpoint lifecycle

Posted by Umesh Telang <Um...@bbc.co.uk>.

Ah, ok. So 32 bytes is required for each pointer to an event. We'll amend our heap size accordingly. We may also be able to reduce our FileChannel size. We hadn't understood the implications of the capacity value of the FileChannel we have been using.

Regarding the multiple data directories, I hadn't realised that that implied distinct disks. Just to confirm, you're saying that each data directory has to be on a distinct disk?

Is it that FileChannel can't utilise an entire disk from an IO perspective, regardless of how big the disk is?  Or is this size-dependent? i.e above a certain size, you need a second data directory? If the latter, could you let me know what that size is?
If it's a general point, then I'll follow the earlier advice of 2 data dirs per file channel.

Apologies for all the questions!

We had made an estimation of disk space (avg event size (~250 bytes)  * channel size (150M)) and have provisioned disks that are significantly larger than the required space.

Thanks,
Umesh

________________________________
From: Brock Noland [brock@cloudera.com]
Sent: 30 January 2014 14:38
To: user@flume.apache.org
Subject: Re: checkpoint lifecycle

On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang <Um...@bbc.co.uk>> wrote:

Hi Brock,

Our heap size is 2GB.

That is not enough heap for 150M events. It's 150 million * 32 bytes = 4.5GB + say 100-500MB for the rest of Flume.


Thanks for the advice on data directories. Could you please let me know the heuristic for that?   (e.g. 1 data directory per N-sized channel where N is...)

File channel at present cannot utilize an entire disk from a IO perspective, that is why I suggest multiple disks. Of course you'll want to ensure that you have enough disk to support a full channel, but that is a different discussion (avg event size * channel size).


Thanks also for suggesting back up checkpoints - are these something that increases the integrity of Flume's execution in an automatic fashion, or does it aid in some form of manual recovery?

Automatic. If flume is killed or shutdown during a checkpoint that checkpoint is invalid and unless a backup checkpoint exists a full replay will have to take place. Furthermore, without FLUME-2155 full replays are very time consuming under certain conditions.


Re: FLUME-2155, I've scanned through it, and will read it in more detail. I'm not sure about the unit of measurement for some of the metrics (milliseconds?), but is there any guidance as to at which order of magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay issue to become apparent?

It's not purely about channel size. Specifically it's about:

1) Large channel size
2) Having a large number of events in your channel (queue depth)
3) Having run the channel for some time such that old WAL's were cleaned up (causing there to be removes for which no event exists)
4) Performing a full replay in these conditions

Generally I wouldn't go over a 1M channel size without backup checkpoint, this change, or both. There are more details here:

https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465

Brock



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

Re: checkpoint lifecycle

Posted by Brock Noland <br...@cloudera.com>.

On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang <Um...@bbc.co.uk>wrote:

>  Hi Brock,
>
>  Our heap size is 2GB.
>

That is not enough heap for 150M events. It's 150 million * 32 bytes =
4.5GB + say 100-500MB for the rest of Flume.

>
>  Thanks for the advice on data directories. Could you please let me know
> the heuristic for that?   (e.g. 1 data directory per N-sized channel where
> N is...)
>

File channel at present cannot utilize an entire disk from a IO
perspective, that is why I suggest multiple disks. Of course you'll want to
ensure that you have enough disk to support a full channel, but that is a
different discussion (avg event size * channel size).

>
>  Thanks also for suggesting back up checkpoints - are these something
> that increases the integrity of Flume's execution in an automatic fashion,
> or does it aid in some form of manual recovery?
>

Automatic. If flume is killed or shutdown during a checkpoint that
checkpoint is invalid and unless a backup checkpoint exists a full replay
will have to take place. Furthermore, without FLUME-2155 full replays are
very time consuming under certain conditions.

>
>  Re: FLUME-2155, I've scanned through it, and will read it in more
> detail. I'm not sure about the unit of measurement for some of the metrics
> (milliseconds?), but is there any guidance as to at which order of
> magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay issue
> to become apparent?
>

It's not purely about channel size. Specifically it's about:

1) Large channel size
2) Having a large number of events in your channel (queue depth)
3) Having run the channel for some time such that old WAL's were cleaned up
(causing there to be removes for which no event exists)
4) Performing a full replay in these conditions

Generally I wouldn't go over a 1M channel size without backup checkpoint,
this change, or both. There are more details here:

https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465

Brock

RE: checkpoint lifecycle

Posted by Umesh Telang <Um...@bbc.co.uk>.

Hi Brock,

Our heap size is 2GB.

Thanks for the advice on data directories. Could you please let me know the heuristic for that?   (e.g. 1 data directory per N-sized channel where N is...)

Thanks also for suggesting back up checkpoints - are these something that increases the integrity of Flume's execution in an automatic fashion, or does it aid in some form of manual recovery?

Re: FLUME-2155, I've scanned through it, and will read it in more detail. I'm not sure about the unit of measurement for some of the metrics (milliseconds?), but is there any guidance as to at which order of magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay issue to become apparent?

Thank you,
Umesh

________________________________
From: Brock Noland [brock@cloudera.com]
Sent: 30 January 2014 13:27
To: user@flume.apache.org
Subject: RE: checkpoint lifecycle


How large is your heap?

You will likely want two data directories per disk. Also with a channel that large I strongly recommend using back up checkpoints.

Additionally https://issues.apache.org/jira/browse/FLUME-2155 will be very useful to you as well.

On Jan 30, 2014 4:21 AM, "Umesh Telang" <Um...@bbc.co.uk>> wrote:

Hi Hari,

The capacity of the channel is 150,000,000. The other properties of the file channel are as below:
a1.channels.s3-file-channel.type = file
a1.channels.s3-file-channel.checkpointDir = /mnt/flume-file-channels/s3-file-channel/checkpoint
a1.channels.s3-file-channel.dataDirs = /mnt/flume-file-channels/s3-file-channel/data
a1.channels.s3-file-channel.transactionCapacity = 20000
a1.channels.s3-file-channel.capacity = 150000000

We've been experimenting with the configuration. We haven't specifically noticed an increase in the checkpoint size. It's just that as the size we've observed is in the order of gigabytes, we wanted to understand how the checkpoint size would vary, if at all.

Based on what you've said, it looks like the checkpoint size is a direct function of the channel capacity. So, for a given channel capacity... as long as there is enough disk space initially provisioned, that should be sufficient for that flume agent.

Thanks again for clarifying!

Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com<ma...@cloudera.com>]
Sent: 29 January 2014 18:55
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: checkpoint lifecycle

What is the capacity of your channel? I would assume that the checkpoint size will remain the same throughout.


Thanks,
Hari


On Wednesday, January 29, 2014 at 9:37 AM, Umesh Telang wrote:

Thanks for the quick response, Hari!

We are using version 1.4.0 of Flume.

The contents and sizes of the checkpoint directory are as below:
$ ls -lh
total 1.2G
-rw-r--r-- 1 flume flume 1.2G Jan 29 17:34 checkpoint
-rw-r--r-- 1 flume flume   25 Jan 29 17:34 checkpoint.meta
-rw-r--r-- 1 flume flume    0 Jan 28 07:56 in_use.lock
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflightputs
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflighttakes

Thanks,
Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com<ma...@cloudera.com>]
Sent: 29 January 2014 17:22
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: checkpoint lifecycle

The checkpoint file itself should be fixed size though other files in that directory may vary in size. What version of flume are you using? Newer versions should have more files in that directory.

On Wednesday, January 29, 2014, Umesh Telang <Um...@bbc.co.uk>> wrote:

Hello,

Under a file channels checkpoint directory, I see the following files:
checkpoint
checkpoint.meta

I wanted to know whether the size of the checkpoint file should reach a steady state if the amount and rate of input to the file chain remains the same.

My understanding is that the checkpoint file is associate with the write ahead log. Is this something that continues to grow indefinitely?

Or is there some lifecycle management that cleans out very old entries from the write ahead log?

If not, is there some strategy that we should employ to manage the size of the checkpoint file (in our case, it's currently over 1GB after 2 days' operation).

Thanks for any advice on this.

Kind regards,
Umesh




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

RE: checkpoint lifecycle

Posted by Brock Noland <br...@cloudera.com>.

How large is your heap?

You will likely want two data directories per disk. Also with a channel
that large I strongly recommend using back up checkpoints.

Additionally https://issues.apache.org/jira/browse/FLUME-2155 will be very
useful to you as well.
On Jan 30, 2014 4:21 AM, "Umesh Telang" <Um...@bbc.co.uk> wrote:

>  Hi Hari,
>
>  The capacity of the channel is 150,000,000. The other properties of the
> file channel are as below:
>  a1.channels.s3-file-channel.type = file
> a1.channels.s3-file-channel.checkpointDir =
> /mnt/flume-file-channels/s3-file-channel/checkpoint
> a1.channels.s3-file-channel.dataDirs =
> /mnt/flume-file-channels/s3-file-channel/data
> a1.channels.s3-file-channel.transactionCapacity = 20000
> a1.channels.s3-file-channel.capacity = 150000000
>
>  We've been experimenting with the configuration. We haven't specifically
> noticed an increase in the checkpoint size. It's just that as the size
> we've observed is in the order of gigabytes, we wanted to understand how
> the checkpoint size would vary, if at all.
>
>  Based on what you've said, it looks like the checkpoint size is a direct
> function of the channel capacity. So, for a given channel capacity... as
> long as there is enough disk space initially provisioned, that should be
> sufficient for that flume agent.
>
>  Thanks again for clarifying!
>
>  Umesh
>
>
>  ------------------------------
> *From:* Hari Shreedharan [hshreedharan@cloudera.com]
> *Sent:* 29 January 2014 18:55
> *To:* user@flume.apache.org
> *Subject:* Re: checkpoint lifecycle
>
>   What is the capacity of your channel? I would assume that the
> checkpoint size will remain the same throughout.
>
>
> Thanks,
> Hari
>
>  On Wednesday, January 29, 2014 at 9:37 AM, Umesh Telang wrote:
>
>   Thanks for the quick response, Hari!
>
>  We are using version 1.4.0 of Flume.
>
>  The contents and sizes of the checkpoint directory are as below:
>  $ ls -lh
> total 1.2G
> -rw-r--r-- 1 flume flume 1.2G Jan 29 17:34 checkpoint
> -rw-r--r-- 1 flume flume   25 Jan 29 17:34 checkpoint.meta
> -rw-r--r-- 1 flume flume    0 Jan 28 07:56 in_use.lock
> -rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflightputs
> -rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflighttakes
>
>  Thanks,
> Umesh
>
>
>   ------------------------------
> *From:* Hari Shreedharan [hshreedharan@cloudera.com]
> *Sent:* 29 January 2014 17:22
> *To:* user@flume.apache.org
> *Subject:* Re: checkpoint lifecycle
>
>  The checkpoint file itself should be fixed size though other files in
> that directory may vary in size. What version of flume are you using? Newer
> versions should have more files in that directory.
>
> On Wednesday, January 29, 2014, Umesh Telang <Um...@bbc.co.uk>
> wrote:
>
>   Hello,
>
>  Under a file channels checkpoint directory, I see the following files:
> checkpoint
> checkpoint.meta
>
>  I wanted to know whether the size of the checkpoint file should reach a
> steady state if the amount and rate of input to the file chain remains the
> same.
>
>  My understanding is that the checkpoint file is associate with the write
> ahead log. Is this something that continues to grow indefinitely?
>
>  Or is there some lifecycle management that cleans out very old entries
> from the write ahead log?
>
>  If not, is there some strategy that we should employ to manage the size
> of the checkpoint file (in our case, it's currently over 1GB after 2 days'
> operation).
>
>  Thanks for any advice on this.
>
>  Kind regards,
> Umesh
>
>
>
> ----------------------------
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
> ---------------------
>
>
>
> ----------------------------
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
> ---------------------
>
>
>
>
> ----------------------------
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
> ---------------------
>

RE: checkpoint lifecycle

Posted by Umesh Telang <Um...@bbc.co.uk>.

Hi Hari,

The capacity of the channel is 150,000,000. The other properties of the file channel are as below:
a1.channels.s3-file-channel.type = file
a1.channels.s3-file-channel.checkpointDir = /mnt/flume-file-channels/s3-file-channel/checkpoint
a1.channels.s3-file-channel.dataDirs = /mnt/flume-file-channels/s3-file-channel/data
a1.channels.s3-file-channel.transactionCapacity = 20000
a1.channels.s3-file-channel.capacity = 150000000

We've been experimenting with the configuration. We haven't specifically noticed an increase in the checkpoint size. It's just that as the size we've observed is in the order of gigabytes, we wanted to understand how the checkpoint size would vary, if at all.

Based on what you've said, it looks like the checkpoint size is a direct function of the channel capacity. So, for a given channel capacity... as long as there is enough disk space initially provisioned, that should be sufficient for that flume agent.

Thanks again for clarifying!

Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com]
Sent: 29 January 2014 18:55
To: user@flume.apache.org
Subject: Re: checkpoint lifecycle

What is the capacity of your channel? I would assume that the checkpoint size will remain the same throughout.


Thanks,
Hari


On Wednesday, January 29, 2014 at 9:37 AM, Umesh Telang wrote:

Thanks for the quick response, Hari!

We are using version 1.4.0 of Flume.

The contents and sizes of the checkpoint directory are as below:
$ ls -lh
total 1.2G
-rw-r--r-- 1 flume flume 1.2G Jan 29 17:34 checkpoint
-rw-r--r-- 1 flume flume   25 Jan 29 17:34 checkpoint.meta
-rw-r--r-- 1 flume flume    0 Jan 28 07:56 in_use.lock
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflightputs
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflighttakes

Thanks,
Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com<ma...@cloudera.com>]
Sent: 29 January 2014 17:22
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: checkpoint lifecycle

The checkpoint file itself should be fixed size though other files in that directory may vary in size. What version of flume are you using? Newer versions should have more files in that directory.

On Wednesday, January 29, 2014, Umesh Telang <Um...@bbc.co.uk>> wrote:

Hello,

Under a file channels checkpoint directory, I see the following files:
checkpoint
checkpoint.meta

I wanted to know whether the size of the checkpoint file should reach a steady state if the amount and rate of input to the file chain remains the same.

My understanding is that the checkpoint file is associate with the write ahead log. Is this something that continues to grow indefinitely?

Or is there some lifecycle management that cleans out very old entries from the write ahead log?

If not, is there some strategy that we should employ to manage the size of the checkpoint file (in our case, it's currently over 1GB after 2 days' operation).

Thanks for any advice on this.

Kind regards,
Umesh




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

Re: checkpoint lifecycle

Posted by Hari Shreedharan <hs...@cloudera.com>.

What is the capacity of your channel? I would assume that the checkpoint size will remain the same throughout. 


Thanks,
Hari


On Wednesday, January 29, 2014 at 9:37 AM, Umesh Telang wrote:

> 
> Thanks for the quick response, Hari! 
> 
> We are using version 1.4.0 of Flume. 
> 
> The contents and sizes of the checkpoint directory are as below: 
> $ ls -lh
> total 1.2G
> -rw-r--r-- 1 flume flume 1.2G Jan 29 17:34 checkpoint
> -rw-r--r-- 1 flume flume   25 Jan 29 17:34 checkpoint.meta
> -rw-r--r-- 1 flume flume    0 Jan 28 07:56 in_use.lock
> -rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflightputs
> -rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflighttakes
> 
> 
> Thanks, 
> Umesh
> 
> 
> From: Hari Shreedharan [hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)]
> Sent: 29 January 2014 17:22
> To: user@flume.apache.org (mailto:user@flume.apache.org)
> Subject: Re: checkpoint lifecycle
> 
> The checkpoint file itself should be fixed size though other files in that directory may vary in size. What version of flume are you using? Newer versions should have more files in that directory.
> 
> On Wednesday, January 29, 2014, Umesh Telang <Umesh.Telang@bbc.co.uk (mailto:Umesh.Telang@bbc.co.uk)> wrote:
> > Hello, 
> > 
> > Under a file channels checkpoint directory, I see the following files: 
> > checkpoint
> > checkpoint.meta
> > 
> > I wanted to know whether the size of the checkpoint file should reach a steady state if the amount and rate of input to the file chain remains the same. 
> > 
> > My understanding is that the checkpoint file is associate with the write ahead log. Is this something that continues to grow indefinitely?   
> > 
> > Or is there some lifecycle management that cleans out very old entries from the write ahead log? 
> > 
> > If not, is there some strategy that we should employ to manage the size of the checkpoint file (in our case, it's currently over 1GB after 2 days' operation). 
> > 
> > Thanks for any advice on this. 
> > 
> > Kind regards, 
> > Umesh
> > 
> >   
> > ----------------------------
> > 
> > http://www.bbc.co.uk
> > This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
> > If you have received it in error, please delete it from your system.
> > Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
> > Please note that the BBC monitors e-mails sent or received.
> > Further communication will signify your consent to this. 
> > ---------------------
> > 
> > 
> 
> 
> 
> 
> 
> 
>  
> ----------------------------
> 
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this. 
> ---------------------
> 
>

RE: checkpoint lifecycle

Posted by Umesh Telang <Um...@bbc.co.uk>.

Thanks for the quick response, Hari!

We are using version 1.4.0 of Flume.

The contents and sizes of the checkpoint directory are as below:
$ ls -lh
total 1.2G
-rw-r--r-- 1 flume flume 1.2G Jan 29 17:34 checkpoint
-rw-r--r-- 1 flume flume   25 Jan 29 17:34 checkpoint.meta
-rw-r--r-- 1 flume flume    0 Jan 28 07:56 in_use.lock
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflightputs
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflighttakes

Thanks,
Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com]
Sent: 29 January 2014 17:22
To: user@flume.apache.org
Subject: Re: checkpoint lifecycle

The checkpoint file itself should be fixed size though other files in that directory may vary in size. What version of flume are you using? Newer versions should have more files in that directory.

On Wednesday, January 29, 2014, Umesh Telang <Um...@bbc.co.uk>> wrote:

Hello,

Under a file channels checkpoint directory, I see the following files:
checkpoint
checkpoint.meta

I wanted to know whether the size of the checkpoint file should reach a steady state if the amount and rate of input to the file chain remains the same.

My understanding is that the checkpoint file is associate with the write ahead log. Is this something that continues to grow indefinitely?

Or is there some lifecycle management that cleans out very old entries from the write ahead log?

If not, is there some strategy that we should employ to manage the size of the checkpoint file (in our case, it's currently over 1GB after 2 days' operation).

Thanks for any advice on this.

Kind regards,
Umesh




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

Re: checkpoint lifecycle

Posted by Hari Shreedharan <hs...@cloudera.com>.

The checkpoint file itself should be fixed size though other files in that
directory may vary in size. What version of flume are you using? Newer
versions should have more files in that directory.

On Wednesday, January 29, 2014, Umesh Telang <Um...@bbc.co.uk> wrote:

>  Hello,
>
>  Under a file channels checkpoint directory, I see the following files:
> checkpoint
> checkpoint.meta
>
>  I wanted to know whether the size of the checkpoint file should reach a
> steady state if the amount and rate of input to the file chain remains the
> same.
>
>  My understanding is that the checkpoint file is associate with the write
> ahead log. Is this something that continues to grow indefinitely?
>
>  Or is there some lifecycle management that cleans out very old entries
> from the write ahead log?
>
>  If not, is there some strategy that we should employ to manage the size
> of the checkpoint file (in our case, it's currently over 1GB after 2 days'
> operation).
>
>  Thanks for any advice on this.
>
>  Kind regards,
> Umesh
>
>
>
> ----------------------------
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
> ---------------------
>