You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Juhani Connolly (JIRA)" <ji...@apache.org> on 2013/02/27 09:59:12 UTC

[jira] [Commented] (FLUME-1929) CheckpointRebuilder main method does not work

    [ https://issues.apache.org/jira/browse/FLUME-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588132#comment-13588132 ] 

Juhani Connolly commented on FLUME-1929:
----------------------------------------

This appears to hang.

Steps followed:
- start up flume, feed some data -kill 9 to try to force an inconsistent checkpoint
- delete in-use.lock, checkpoint and checkpoint.meta
- run the checkpoint rebuilder, final command through our script is(not that I patched -c to become -h)

+ exec /usr/local/java/bin/java -server -XX:OnOutOfMemoryError=/tmp/stop.sh -XX:MaxPermSize=24m -XX:PermSize=24m -XX:SurvivorRatio=8 -Xmn96m -Xmx512m -Xms128m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=172.28.202.76 -Dflume.monitoring.type=GANGLIA -Dflume.monitoring.hosts=pat-log-om01:8649 -cp '/etc/flume/conf:/usr/lib/flume/lib/*' -Djava.library.path= org.apache.flume.channel.file.CheckpointRebuilder -h /tmp/flume-check -l /tmp/flume-data -t 5000000



Full logs are as below:

27 Feb 2013 17:51:35,995 INFO  [main] (org.apache.flume.channel.file.EventQueueBackingStoreFile.<init>:71)  - Preallocated /tmp/flume-check/checkpoint to 40008232 for capacity 5000000
27 Feb 2013 17:51:36,004 INFO  [main] (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.<init>:47)  - Starting up with /tmp/flume-check/checkpoint and /tmp/flume-check/checkpoint.meta
27 Feb 2013 17:51:36,078 INFO  [main] (org.apache.flume.channel.file.CheckpointRebuilder.rebuild:64)  - Attempting to fast replay the log files.
27 Feb 2013 17:51:36,112 INFO  [main] (org.apache.flume.tools.DirectMemoryUtils.getDefaultDirectMemorySize:113)  - Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
27 Feb 2013 17:51:36,117 INFO  [main] (org.apache.flume.tools.DirectMemoryUtils.allocate:47)  - Direct Memory Allocation:  Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 526843904, Remaining = 526843904
27 Feb 2013 17:51:36,866 INFO  [main] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 150457 in /tmp/flume-data/log-3
27 Feb 2013 17:51:36,884 INFO  [main] (org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - Encountered EOF at 4095 in /tmp/flume-data/log-4
27 Feb 2013 17:51:36,887 INFO  [main] (org.apache.flume.channel.file.CheckpointRebuilder.rebuild:151)  - Replayed 0 events using fast replay logic.
27 Feb 2013 17:51:36,889 INFO  [main] (org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:108)  - Start checkpoint for /tmp/flume-check/checkpoint, elements to sync = 0
27 Feb 2013 17:51:36,896 INFO  [main] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:120)  - Updating checkpoint metadata: logWriteOrderID: 1361955096886, queueSize: 0, queueHead: 0
27 Feb 2013 17:51:36,906 INFO  [main] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-3.meta currentPosition = 0, logWriteOrderID = 1361955096886
27 Feb 2013 17:51:36,908 INFO  [main] (org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - Updating log-4.meta currentPosition = 4095, logWriteOrderID = 1361955096886



Some diagnostics:

# lsof +d /tmp/flume-data
COMMAND   PID            USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
bash    15144 juhani_connolly  cwd    DIR  252,0     4096 132605 /tmp/flume-data
sudo    16392            root  cwd    DIR  252,0     4096 132605 /tmp/flume-data
lsof    16394            root  cwd    DIR  252,0     4096 132605 /tmp/flume-data
lsof    16395            root  cwd    DIR  252,0     4096 132605 /tmp/flume-data


Attaching thread dump
                
> CheckpointRebuilder main method does not work
> ---------------------------------------------
>
>                 Key: FLUME-1929
>                 URL: https://issues.apache.org/jira/browse/FLUME-1929
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>            Priority: Minor
>         Attachments: FLUME-1929.patch
>
>
> Based on the discussion in this thread: http://apache.markmail.org/thread/567cshrmz35okrq3 - the main method in CheckpointRebuilder was not updated for the new data file format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira