You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Edward Sargisson <es...@pobox.com> on 2013/05/08 18:04:47 UTC

spooldir source reading Flume itself and thinking the file has changed (1.3.1)

Hi all,
I've been experimenting with using Flume to transport Flume's own logs.

My setup is that I have the apache-log4j-extras RollingFileAppender rolling
the file every minute into a spool directory. Then I have Flume with the
spooldir source reading that directory.

Very often I see:
java.lang.IllegalStateException: File name has been re-used with different
files. Spooling assumpti
on violated for
/opt/castellan/log/spool/castellan-reader.20130430T2055.log.COMPLETED
        at
org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineRe
ader.java:272)

Then it never recovers, repeatedly saying:
2013-05-08 15:57:19,233 ERROR [pool-10-thread-1]
org.apache.flume.source.SpoolDirectorySource Uncau
ght exception in Runnable
java.io.IOException: Stream closed
        at java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
        at java.io.BufferedReader.readLine(BufferedReader.java:310)
        at java.io.BufferedReader.readLine(BufferedReader.java:382)
        at
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:180)

I note that 1.4 has extensive changes to the spooldir source but nothing
that really seemed to speak to this.

Does anybody else see this? Anything I should look for? Or is this some
kind of race condition between log4j and spooldir?

Cheers,
Edward

Re: spooldir source reading Flume itself and thinking the file has changed (1.3.1)

Posted by Mike Percy <mp...@apache.org>.
Hi Edward,
Spooldir assumes that the file is atomically moved into the directory in
question, maybe we should make it wait around until he last mod time is >
10 seconds or something to avoid situations where a readdir   + stat
catches it during a file copy operation and thinks you're changing the
files. If you're sure that the file name is not being reused then maybe
it's a bug.

We require file name to be unique because we don't have a great way to read
the inode with Java.

Mike


On Wed, May 8, 2013 at 9:04 AM, Edward Sargisson <es...@pobox.com> wrote:

> Hi all,
> I've been experimenting with using Flume to transport Flume's own logs.
>
> My setup is that I have the apache-log4j-extras RollingFileAppender rolling
> the file every minute into a spool directory. Then I have Flume with the
> spooldir source reading that directory.
>
> Very often I see:
> java.lang.IllegalStateException: File name has been re-used with different
> files. Spooling assumpti
> on violated for
> /opt/castellan/log/spool/castellan-reader.20130430T2055.log.COMPLETED
>         at
>
> org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineRe
> ader.java:272)
>
> Then it never recovers, repeatedly saying:
> 2013-05-08 15:57:19,233 ERROR [pool-10-thread-1]
> org.apache.flume.source.SpoolDirectorySource Uncau
> ght exception in Runnable
> java.io.IOException: Stream closed
>         at java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
>         at java.io.BufferedReader.readLine(BufferedReader.java:310)
>         at java.io.BufferedReader.readLine(BufferedReader.java:382)
>         at
>
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:180)
>
> I note that 1.4 has extensive changes to the spooldir source but nothing
> that really seemed to speak to this.
>
> Does anybody else see this? Anything I should look for? Or is this some
> kind of race condition between log4j and spooldir?
>
> Cheers,
> Edward
>