You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Hari Shreedharan <hs...@cloudera.com> on 2013/03/01 23:57:03 UTC

Re: File Channel error stops flume

>From the logs, the logs look OK and the channel is working fine. It seems to have been replaying - that is pretty much it. 

-- 
Hari Shreedharan


On Monday, February 25, 2013 at 4:28 PM, Rahul Ravindran wrote:

> I have attached the zipped log file at
> https://issues.apache.org/jira/browse/FLUME-1928
> 
> From: Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)>
> To: user@flume.apache.org (mailto:user@flume.apache.org); Rahul Ravindran <rahulrv@yahoo.com (mailto:rahulrv@yahoo.com)> 
> Sent: Monday, February 25, 2013 1:30 PM
> Subject: Re: File Channel error stops flume
> 
> Can you send your full logs? I suspect the channel did a full replay because it was restarted during a restart. (If it did, the logs would show a BadCheckpointException). 
> 
> 
> Hari 
> 
> -- 
> Hari Shreedharan
> 
> On Monday, February 25, 2013 at 1:20 PM, Rahul Ravindran wrote:
> > Thanks Hari. I had waited for 20 minutes and this did not move change. Now, after more than an hour, I see it working
> > 
> > From: Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)>
> > To: user@flume.apache.org (mailto:user@flume.apache.org); Rahul Ravindran <rahulrv@yahoo.com (mailto:rahulrv@yahoo.com)> 
> > Sent: Monday, February 25, 2013 12:46 PM
> > Subject: Re: File Channel error stops flume
> > 
> > Rahul, 
> > 
> > Those messages actually just suggest that your channel is replaying. The channel will complete the replay and the agent will start the sinks once the channel is ready. It might take a few minutes based on how many events you have in the channel.
> > 
> > 
> > Hari 
> > 
> > -- 
> > Hari Shreedharan
> > 
> > On Monday, February 25, 2013 at 12:07 PM, Rahul Ravindran wrote:
> > > Hi,
> > >    I modified a parameter to the HDFS sink on a flume config (added an idleInterval) on 2 machines. Things worked fine on one, and not on the other. I tried restarting flume a couple of times and I continue seeing the same log statement (bolded below) with no writes to HDFS
> > > 
> > > 25 Feb 2013 08:27:00,174 INFO  [Log-BackgroundWorker-ch2] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 8506
> > > :%
> > > 25 Feb 2013 19:55:51,577 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-17
> > > 25 Feb 2013 19:55:51,585 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume1/data/log-17
> > > 25 Feb 2013 19:55:51,588 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.tools.DirectMemoryUtils.getDefaultDirectMemorySize:113)  - Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
> > > 25 Feb 2013 19:55:51,592 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.tools.DirectMemoryUtils.allocate:47)  - Direct Memory Allocation:  Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 268435456, Remaining = 268435456
> > > 25 Feb 2013 19:55:51,634 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 1622812128
> > > 25 Feb 2013 19:55:51,634 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 1622720601
> > > 25 Feb 2013 19:55:51,654 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-18
> > > 25 Feb 2013 19:55:51,655 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 1622821593
> > > 25 Feb 2013 19:55:51,655 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-19
> > > 25 Feb 2013 19:55:51,656 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 1622678590
> > > 25 Feb 2013 19:55:51,656 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-20
> > > 25 Feb 2013 19:55:51,657 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 244707334
> > > 25 Feb 2013 19:55:51,657 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-21
> > > 25 Feb 2013 19:55:51,657 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 530601497
> > > 25 Feb 2013 19:55:51,658 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 530601497 in /flume2/data/log-21
> > > 25 Feb 2013 19:55:51,658 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-22
> > > 25 Feb 2013 19:55:51,658 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume1/data/log-18
> > > 25 Feb 2013 19:55:51,658 WARN  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:470)  - Checkpoint for file(/flume2/data/log-22) is: 1361844516782, which is beyond the requested checkpoint time: 1361844516783 and position 0
> > > 25 Feb 2013 19:55:51,659 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 1622674426
> > > 25 Feb 2013 19:55:51,659 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume2/data/log-23
> > > 25 Feb 2013 19:55:51,659 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume1/data/log-19
> > > 25 Feb 2013 19:55:51,659 WARN  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:470)  - Checkpoint for file(/flume2/data/log-23) is: 1361844516783, which is beyond the requested checkpoint time: 1361844516783 and position 0
> > > 25 Feb 2013 19:55:51,660 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 1622239091
> > > 25 Feb 2013 19:55:51,660 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume1/data/log-20
> > > 25 Feb 2013 19:55:51,661 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 221490603
> > > 25 Feb 2013 19:55:51,661 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying /flume1/data/log-21
> > > 25 Feb 2013 19:55:51,661 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)  - fast-forward to checkpoint position: 532696754
> > > 25 Feb 2013 19:55:52,048 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195597 in /flume1/data/log-17
> > > 25 Feb 2013 19:55:52,103 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195583 in /flume2/data/log-17
> > > 25 Feb 2013 19:55:52,308 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195536 in /flume2/data/log-18
> > > 25 Feb 2013 19:55:52,319 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195584 in /flume1/data/log-18
> > > 25 Feb 2013 19:55:52,418 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195600 in /flume2/data/log-19
> > > 25 Feb 2013 19:55:52,439 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 244816205 in /flume2/data/log-20
> > > 25 Feb 2013 19:55:52,440 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:320)  - read: 12348, put: 0, take: 0, rollback: 0, commit: 0, skip: 12348, eventCount:0
> > > 25 Feb 2013 19:55:52,441 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.Log.replay:399)  - Rolling /flume2/data
> > > 25 Feb 2013 19:55:52,441 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.Log.roll:811)  - Roll start /flume2/data
> > > 25 Feb 2013 19:55:52,443 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$Writer.<init>:171)  - Opened /flume2/data/log-24
> > > 25 Feb 2013 19:55:52,449 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.Log.roll:826)  - Roll end
> > > 25 Feb 2013 19:55:52,453 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 0
> > > 25 Feb 2013 19:55:52,455 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:117)  - Updating checkpoint metadata: logWriteOrderID: 1361844516784, queueSize: 34525000, queueHead: 40625267
> > > 25 Feb 2013 19:55:52,489 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-24.meta currentPosition = 0, logWriteOrderID = 1361844516784
> > > 25 Feb 2013 19:55:52,491 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.Log.writeCheckpoint:886)  - Updated checkpoint for file: /flume2/data/log-24 position: 0 logWriteOrderID: 1361844516784
> > > 25 Feb 2013 19:55:52,491 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-17
> > > 25 Feb 2013 19:55:52,497 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-17.meta currentPosition = 1622720601, logWriteOrderID = 1361844516784
> > > 25 Feb 2013 19:55:52,499 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-18
> > > 25 Feb 2013 19:55:52,500 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 1623195593 in /flume1/data/log-19
> > > 25 Feb 2013 19:55:52,505 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-18.meta currentPosition = 1622821593, logWriteOrderID = 1361844516784
> > > 25 Feb 2013 19:55:52,507 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-19
> > > 25 Feb 2013 19:55:52,513 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-19.meta currentPosition = 1622678590, logWriteOrderID = 1361844516784
> > > 25 Feb 2013 19:55:52,514 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-20
> > > 25 Feb 2013 19:55:52,520 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-20.meta currentPosition = 244707334, logWriteOrderID = 1361844516784
> > > 25 Feb 2013 19:55:52,521 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing RandomReader /flume2/data/log-21
> > > 25 Feb 2013 19:55:52,527 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-21.meta currentPosition = 530601497, logWriteOrderID = 1361844516784
> > > 25 Feb 2013 19:55:52,529 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.FileChannel.start:309)  - Queue Size after replay: 34525000 [channel=ch2]
> > > 25 Feb 2013 19:55:52,594 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:89)  - Monitoried counter group for type: CHANNEL, name: ch2, registered successfully.
> > > 25 Feb 2013 19:55:52,594 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:73)  - Component type: CHANNEL, name: ch2 started
> > > 25 Feb 2013 19:55:52,619 INFO  [lifecycleSupervisor-1-1] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 222290119 in /flume1/data/log-20
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
>