You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Rahul Ravindran <ra...@yahoo.com> on 2013/05/07 06:40:59 UTC

Usage of use-fast-replay for FileChannel

Hi,
   For FileChannel, how much of a performance improvement in replay times were observed with use-fast-replay? We currently have use-fast-replay set to false and were replaying about 6 G of data. We noticed replay times of about one hour. I looked at the code and it appears that fast-replay does not guarantee the same ordering of events during replay. Is this accurate? Are there any other downsides of using fast-replay? Any stability concerns?
Thanks,
~Rahul.

Re: Usage of use-fast-replay for FileChannel

Posted by Rahul Ravindran <ra...@yahoo.com>.
I searched through the logs but did not find a BadCheckpointException. I extrapolated the 6G from the size of the log-* files being read. Below is a snippet of the logs. As you can see from the timestamp of the log lines, the channel ch3 was started at time 3:32 and channel startup activity completed only at 4:40

07 May 2013 03:32:40,598 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-297.meta currentPosition = 1455187648, logWriteOrderID = 1385582495370
07 May 2013 03:32:40,599 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.FileChannel.start:309)  - Queue Size after replay: 58497343 [channel=ch3]
07 May 2013 03:32:40,628 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:89)  - Monitoried counter group for type: CHANNEL, name: ch3, registered successfully.
07 May 2013 03:32:40,628 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:73)  - Component type: CHANNEL, name: ch3 started
07 May 2013 03:32:40,887 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195640 in /flume2/data/log-861
07 May 2013 03:32:45,972 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195632 in /flume2/data/log-862
07 May 2013 03:32:47,777 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195646 in /flume2/data/log-863
07 May 2013 03:32:49,349 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195620 in /flume2/data/log-864
07 May 2013 04:17:37,858 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1442343132 in /flume2/data/log-865
07 May 2013 04:17:37,859 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.ReplayHandler.replayLog:320)  - read: 1829658, put: 0, take: 45223, rollback: 1, commit: 11, skip: 1784423, eventCount:44000
07 May 2013 04:17:37,860 ERROR [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.ReplayHandler.replayLog:354)  - Pending takes 1473 exist after the end of replay. Duplicate messages will exist in destination.
07 May 2013 04:17:37,860 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.Log.replay:399)  - Rolling /flume2/data
07 May 2013 04:17:37,860 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.Log.roll:811)  - Roll start /flume2/data
07 May 2013 04:17:37,861 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$Writer.<init>:171)  - Opened /flume2/data/log-866
07 May 2013 04:17:37,862 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.Log.roll:826)  - Roll end
07 May 2013 04:17:37,863 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 42527
07 May 2013 04:17:37,865 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:117)  - Updating checkpoint metadata: logWriteOrderID: 1385582495371, queueSize: 46200121, queueHead: 65890565
07 May 2013 04:17:37,906 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-866.meta currentPosition = 0, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,907 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.Log.writeCheckpoint:886)  - Updated checkpoint for file: /flume2/data/log-866 position: 0 logWriteOrderID: 1385582495371
07 May 2013 04:17:37,908 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-855
07 May 2013 04:17:37,913 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-855.meta currentPosition = 1592727635, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,915 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-856
07 May 2013 04:17:37,921 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-856.meta currentPosition = 1601406592, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,922 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-857
07 May 2013 04:17:37,928 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-857.meta currentPosition = 1608758564, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,929 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-858
07 May 2013 04:17:37,935 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-858.meta currentPosition = 1558785242, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,936 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-859
07 May 2013 04:17:37,941 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-859.meta currentPosition = 586542373, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,943 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-861
07 May 2013 04:17:37,949 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-861.meta currentPosition = 1556011472, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,951 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-862
07 May 2013 04:17:37,956 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-862.meta currentPosition = 1563024320, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,958 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-863
07 May 2013 04:17:37,963 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-863.meta currentPosition = 1612213007, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,965 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-864
07 May 2013 04:17:37,971 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-864.meta currentPosition = 1613618738, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,972 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-865
07 May 2013 04:17:37,977 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-865.meta currentPosition = 1440488562, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,979 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.channel.file.FileChannel.start:309)  - Queue Size after replay: 46200121 [channel=ch2]
07 May 2013 04:17:37,979 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:89)  - Monitoried counter group for type: CHANNEL, name: ch2, registered successfully.
07 May 2013 04:17:37,980 INFO  [lifecycleSupervisor-1-3] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:73)  - Component type: CHANNEL, name: ch2 started
07 May 2013 04:40:37,198 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1522974283 in /flume1/data/log-861
07 May 2013 04:40:37,199 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.ReplayHandler.replayLog:320)  - read: 1249010, put: 0, take: 121197, rollback: 0, commit: 31, skip: 1127782, eventCount:124000
07 May 2013 04:40:37,199 ERROR [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.ReplayHandler.replayLog:354)  - Pending takes 3504 exist after the end of replay. Duplicate messages will exist in destination.
07 May 2013 04:40:37,199 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.Log.replay:399)  - Rolling /flume1/data
07 May 2013 04:40:37,200 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.Log.roll:811)  - Roll start /flume1/data
07 May 2013 04:40:37,200 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$Writer.<init>:171)  - Opened /flume1/data/log-862
07 May 2013 04:40:37,202 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.Log.roll:826)  - Roll end
07 May 2013 04:40:37,203 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - Start checkpoint for /flume1/checkpoint/checkpoint, elements to sync = 120496
07 May 2013 04:40:37,204 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:117)  - Updating checkpoint metadata: logWriteOrderID: 1385582495372, queueSize: 29456099, queueHead: 22675305
07 May 2013 04:40:37,238 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-862.meta currentPosition = 0, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,240 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.Log.writeCheckpoint:886)  - Updated checkpoint for file: /flume1/data/log-862 position: 0 logWriteOrderID: 1385582495372
07 May 2013 04:40:37,240 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume1/data/log-855
07 May 2013 04:40:37,246 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-855.meta currentPosition = 1547225990, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,247 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume1/data/log-856
07 May 2013 04:40:37,253 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-856.meta currentPosition = 892596719, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,255 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume1/data/log-857
07 May 2013 04:40:37,260 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-857.meta currentPosition = 1559591451, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,262 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume1/data/log-858
07 May 2013 04:40:37,267 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-858.meta currentPosition = 1550429668, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,269 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume1/data/log-859
07 May 2013 04:40:37,274 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-859.meta currentPosition = 1616923189, logWriteOrderID = 1385582495372


________________________________
 From: Hari Shreedharan <hs...@cloudera.com>
To: "user@flume.apache.org" <us...@flume.apache.org>; Rahul Ravindran <ra...@yahoo.com> 
Sent: Monday, May 6, 2013 9:57 PM
Subject: Re: Usage of use-fast-replay for FileChannel
 


Did you have an issue with the checkpoint that the entire 6G of data was replayed (look for BadCheckpointException in the logs to figure out if the channel was stopped in middle of a checkpoint)?

With the next version of Flume, you should be able to recover even if the channel stopped while the checkpoint was being written.

Fast Replay will try to maintain order, but it will require a massive amount of memory to run if you have a large number of events. Also, fast replay will only run if the checkpoint is corrupt/does not exist.

Hari




On Mon, May 6, 2013 at 9:40 PM, Rahul Ravindran <ra...@yahoo.com> wrote:

Hi,
>   For FileChannel, how much of a performance improvement in replay times were observed with use-fast-replay? We currently have use-fast-replay set to false and were replaying about 6 G of data. We noticed replay times of about one hour. I looked at the code and it appears that fast-replay does not guarantee the same ordering of events during replay. Is this accurate? Are there any other downsides of using fast-replay? Any stability concerns?
>Thanks,
>~Rahul.

Re: Usage of use-fast-replay for FileChannel

Posted by Hari Shreedharan <hs...@cloudera.com>.
Did you have an issue with the checkpoint that the entire 6G of data was
replayed (look for BadCheckpointException in the logs to figure out if the
channel was stopped in middle of a checkpoint)?

With the next version of Flume, you should be able to recover even if the
channel stopped while the checkpoint was being written.

Fast Replay will try to maintain order, but it will require a massive
amount of memory to run if you have a large number of events. Also, fast
replay will only run if the checkpoint is corrupt/does not exist.

Hari


On Mon, May 6, 2013 at 9:40 PM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Hi,
>    For FileChannel, how much of a performance improvement in replay times
> were observed with use-fast-replay? We currently have use-fast-replay set
> to false and were replaying about 6 G of data. We noticed replay times of
> about one hour. I looked at the code and it appears that fast-replay does
> not guarantee the same ordering of events during replay. Is this accurate?
> Are there any other downsides of using fast-replay? Any stability concerns?
> Thanks,
> ~Rahul.
>