You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Chris Neal <cw...@gmail.com> on 2013/03/21 21:21:02 UTC
Simple HDFS Sink file rolling question please.
Hi :)
I have an ExecSource running a tail -F on a bunch of log files that get
rotated nightly by log4J. I want my HDFS Sink to roll them when log4J
rolls them. I tried setting all the "roll" parameters to 0, thinking a new
file handle from the ExecSource would cause the current file in HDFS to be
closed, and a new file to be created, but I'm seeing only the new file
created, and the previous days file is still there as a .tmp file, unclosed.
I was wondering what configuration would achieve the behavior I'm after?
I was thinking a rollInterval of 24 hours, but wouldn't that cause HDFS to
roll the file at a different time than log4J rolled it?
Thanks for the time :)
Here is my HDFS Sink setup currently:
# hdfs-hadoopjt01_1-sink properties
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.type = hdfs
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.path =
hdfs://nameservice1/%{path}
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.filePrefix =
%{filename}.%Y-%m-%d_1
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollInterval = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollSize = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollCount = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.batchSize = 10000
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.threadsPoolSize = 8
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollTimerPoolSize = 5
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.codeC = GzipCodec
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.fileType = CompressedStream
Re: Simple HDFS Sink file rolling question please.
Posted by Mike Percy <mp...@cloudera.com>.
Hi Chris,
Check out hdfs.idleTimeout parameter. Maybe set it to 5 minutes (i.e.
hdfs.idleTimeout = 300) or something.
http://flume.apache.org/FlumeUserGuide.html
Regards,
Mike
On Thu, Mar 21, 2013 at 1:21 PM, Chris Neal <cw...@gmail.com> wrote:
> Hi :)
>
> I have an ExecSource running a tail -F on a bunch of log files that get
> rotated nightly by log4J. I want my HDFS Sink to roll them when log4J
> rolls them. I tried setting all the "roll" parameters to 0, thinking a new
> file handle from the ExecSource would cause the current file in HDFS to be
> closed, and a new file to be created, but I'm seeing only the new file
> created, and the previous days file is still there as a .tmp file, unclosed.
>
> I was wondering what configuration would achieve the behavior I'm after?
> I was thinking a rollInterval of 24 hours, but wouldn't that cause HDFS to
> roll the file at a different time than log4J rolled it?
>
> Thanks for the time :)
>
> Here is my HDFS Sink setup currently:
>
> # hdfs-hadoopjt01_1-sink properties
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.type = hdfs
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.path =
> hdfs://nameservice1/%{path}
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.filePrefix =
> %{filename}.%Y-%m-%d_1
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollInterval = 0
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollSize = 0
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollCount = 0
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.batchSize = 10000
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.threadsPoolSize = 8
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollTimerPoolSize = 5
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.codeC = GzipCodec
> hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.fileType = CompressedStream
>