You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-user@logging.apache.org by Arkin Yetis <ar...@gmail.com> on 2014/04/04 21:04:36 UTC

Flume Appender failure due to filesystem issue

We use the Flume Appender. Our logging stopped after a certain point in
time and we noticed the exception at the end of this message in our
application logs. It looks like there was an issue with the filesystem. But
although the filesystem has recovered, the appender (or probably the
persistence mechanism it uses) was stuck in this state and it took an
application restart for it to continue logging. It does not look like there
is a recovery mechanism or if there is one it failed.
Would you like me to open a log4j JIRA ticket for this? Or is this
something that can be prevented by something simple you can share over
e-mail such as a certain configuration setting?

Thanks,
- Arkin

Exception stack is:
1. Stale NFS file handle (java.io.IOException)
  java.io.RandomAccessFile:-2 (null)
2. Environment invalid because of previous exception: (JE 5.0.73)
/app/logs/abs-workflow/flumeDir java.io.IOException: Stale NFS file handle
LOG_READ: IOException on read, log is likely invalid. Environment is
invalid and must be closed. fetchTarget of 0x542/0x4af13c parent IN=5 IN
class=com.sleepycat.je.tree.BIN lastFullVersion=0x543/0x62d6c5
lastLoggedVersion=0x543/0x62d6c5 parent.getDirty()=true state=0
(com.sleepycat.je.EnvironmentFailureException)
  com.sleepycat.je.log.FileManager:1883 (null)

********************************************************************************
Root Exception stack trace:java.io.IOException: Stale NFS file handle
    at java.io.RandomAccessFile.readBytes(Native Method)
    at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
    at
com.sleepycat.je.log.FileManager.readFromFileInternal(FileManager.java:1918)
    at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1869)
    at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1807)
    at com.sleepycat.je.log.FileSource.getBytes(FileSource.java:56)
    at
com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:919)
    at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848)
    at
com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:809)
    at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)
    at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)
    at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2261)
    at
com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)
    at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)
    at
com.sleepycat.je.cleaner.UtilizationProfile.getObsoleteDetail(UtilizationProfile.java:632)
    at
com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:439)
    at
com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:289)
    at
com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:148)
    at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:163)
    at java.lang.Thread.run(Thread.java:662)
********************************************************************************

Re: Flume Appender failure due to filesystem issue

Posted by Ralph Goers <ra...@dslextreme.com>.
That is probably a good idea anyway. You will get much better performance from local disk.  The only downside is if your local disk is small.

Ralph

On Apr 5, 2014, at 5:52 PM, Arkin Yetis <ar...@gmail.com> wrote:

> Yes, I was referring to the Flume Persistent Appender. I will open an
> enhancement request in JIRA.In the meanwhile, we will consider using local
> disk instead of a remote filesystem to decrease the likelihood of the issue
> occurring.
> 
> Thanks,
> - Arkin
> 
> 
> On Fri, Apr 4, 2014 at 2:46 PM, Ralph Goers <ra...@dslextreme.com>wrote:
> 
>> I'm assuming you are using the Flume Persistent Appender which uses
>> Berkeley DB based on the logs below.  From the log it appears that Berkeley
>> DB's file handle is stale and meets to be closed and reopened.  I haven't
>> checked but Berkeley DB might have a setting for this, otherwise the Flume
>> Persistent Manager would need to deal with this condition. That would be a
>> big change as the Database object is currently immutable.
>> 
>> Ralph
>> 
>> On Apr 4, 2014, at 12:04 PM, Arkin Yetis <ar...@gmail.com> wrote:
>> 
>>> We use the Flume Appender. Our logging stopped after a certain point in
>>> time and we noticed the exception at the end of this message in our
>>> application logs. It looks like there was an issue with the filesystem.
>> But
>>> although the filesystem has recovered, the appender (or probably the
>>> persistence mechanism it uses) was stuck in this state and it took an
>>> application restart for it to continue logging. It does not look like
>> there
>>> is a recovery mechanism or if there is one it failed.
>>> Would you like me to open a log4j JIRA ticket for this? Or is this
>>> something that can be prevented by something simple you can share over
>>> e-mail such as a certain configuration setting?
>>> 
>>> Thanks,
>>> - Arkin
>>> 
>>> Exception stack is:
>>> 1. Stale NFS file handle (java.io.IOException)
>>> java.io.RandomAccessFile:-2 (null)
>>> 2. Environment invalid because of previous exception: (JE 5.0.73)
>>> /app/logs/abs-workflow/flumeDir java.io.IOException: Stale NFS file
>> handle
>>> LOG_READ: IOException on read, log is likely invalid. Environment is
>>> invalid and must be closed. fetchTarget of 0x542/0x4af13c parent IN=5 IN
>>> class=com.sleepycat.je.tree.BIN lastFullVersion=0x543/0x62d6c5
>>> lastLoggedVersion=0x543/0x62d6c5 parent.getDirty()=true state=0
>>> (com.sleepycat.je.EnvironmentFailureException)
>>> com.sleepycat.je.log.FileManager:1883 (null)
>>> 
>>> 
>> ********************************************************************************
>>> Root Exception stack trace:java.io.IOException: Stale NFS file handle
>>>   at java.io.RandomAccessFile.readBytes(Native Method)
>>>   at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
>>>   at
>>> 
>> com.sleepycat.je.log.FileManager.readFromFileInternal(FileManager.java:1918)
>>>   at
>> com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1869)
>>>   at
>> com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1807)
>>>   at com.sleepycat.je.log.FileSource.getBytes(FileSource.java:56)
>>>   at
>>> 
>> com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:919)
>>>   at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848)
>>>   at
>>> 
>> com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:809)
>>>   at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)
>>>   at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)
>>>   at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2261)
>>>   at
>>> 
>> com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)
>>>   at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)
>>>   at
>>> 
>> com.sleepycat.je.cleaner.UtilizationProfile.getObsoleteDetail(UtilizationProfile.java:632)
>>>   at
>>> 
>> com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:439)
>>>   at
>>> com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:289)
>>>   at
>>> com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:148)
>>>   at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:163)
>>>   at java.lang.Thread.run(Thread.java:662)
>>> 
>> ********************************************************************************
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
>> For additional commands, e-mail: log4j-user-help@logging.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Flume Appender failure due to filesystem issue

Posted by Arkin Yetis <ar...@gmail.com>.
Yes, I was referring to the Flume Persistent Appender. I will open an
enhancement request in JIRA.In the meanwhile, we will consider using local
disk instead of a remote filesystem to decrease the likelihood of the issue
occurring.

Thanks,
- Arkin


On Fri, Apr 4, 2014 at 2:46 PM, Ralph Goers <ra...@dslextreme.com>wrote:

> I'm assuming you are using the Flume Persistent Appender which uses
> Berkeley DB based on the logs below.  From the log it appears that Berkeley
> DB's file handle is stale and meets to be closed and reopened.  I haven't
> checked but Berkeley DB might have a setting for this, otherwise the Flume
> Persistent Manager would need to deal with this condition. That would be a
> big change as the Database object is currently immutable.
>
> Ralph
>
> On Apr 4, 2014, at 12:04 PM, Arkin Yetis <ar...@gmail.com> wrote:
>
> > We use the Flume Appender. Our logging stopped after a certain point in
> > time and we noticed the exception at the end of this message in our
> > application logs. It looks like there was an issue with the filesystem.
> But
> > although the filesystem has recovered, the appender (or probably the
> > persistence mechanism it uses) was stuck in this state and it took an
> > application restart for it to continue logging. It does not look like
> there
> > is a recovery mechanism or if there is one it failed.
> > Would you like me to open a log4j JIRA ticket for this? Or is this
> > something that can be prevented by something simple you can share over
> > e-mail such as a certain configuration setting?
> >
> > Thanks,
> > - Arkin
> >
> > Exception stack is:
> > 1. Stale NFS file handle (java.io.IOException)
> >  java.io.RandomAccessFile:-2 (null)
> > 2. Environment invalid because of previous exception: (JE 5.0.73)
> > /app/logs/abs-workflow/flumeDir java.io.IOException: Stale NFS file
> handle
> > LOG_READ: IOException on read, log is likely invalid. Environment is
> > invalid and must be closed. fetchTarget of 0x542/0x4af13c parent IN=5 IN
> > class=com.sleepycat.je.tree.BIN lastFullVersion=0x543/0x62d6c5
> > lastLoggedVersion=0x543/0x62d6c5 parent.getDirty()=true state=0
> > (com.sleepycat.je.EnvironmentFailureException)
> >  com.sleepycat.je.log.FileManager:1883 (null)
> >
> >
> ********************************************************************************
> > Root Exception stack trace:java.io.IOException: Stale NFS file handle
> >    at java.io.RandomAccessFile.readBytes(Native Method)
> >    at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
> >    at
> >
> com.sleepycat.je.log.FileManager.readFromFileInternal(FileManager.java:1918)
> >    at
> com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1869)
> >    at
> com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1807)
> >    at com.sleepycat.je.log.FileSource.getBytes(FileSource.java:56)
> >    at
> >
> com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:919)
> >    at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848)
> >    at
> >
> com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:809)
> >    at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)
> >    at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)
> >    at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2261)
> >    at
> >
> com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)
> >    at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)
> >    at
> >
> com.sleepycat.je.cleaner.UtilizationProfile.getObsoleteDetail(UtilizationProfile.java:632)
> >    at
> >
> com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:439)
> >    at
> > com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:289)
> >    at
> > com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:148)
> >    at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:163)
> >    at java.lang.Thread.run(Thread.java:662)
> >
> ********************************************************************************
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
>
>

Re: Flume Appender failure due to filesystem issue

Posted by Ralph Goers <ra...@dslextreme.com>.
I’m assuming you are using the Flume Persistent Appender which uses Berkeley DB based on the logs below.  From the log it appears that Berkeley DB’s file handle is stale and meets to be closed and reopened.  I haven’t checked but Berkeley DB might have a setting for this, otherwise the Flume Persistent Manager would need to deal with this condition. That would be a big change as the Database object is currently immutable.

Ralph

On Apr 4, 2014, at 12:04 PM, Arkin Yetis <ar...@gmail.com> wrote:

> We use the Flume Appender. Our logging stopped after a certain point in
> time and we noticed the exception at the end of this message in our
> application logs. It looks like there was an issue with the filesystem. But
> although the filesystem has recovered, the appender (or probably the
> persistence mechanism it uses) was stuck in this state and it took an
> application restart for it to continue logging. It does not look like there
> is a recovery mechanism or if there is one it failed.
> Would you like me to open a log4j JIRA ticket for this? Or is this
> something that can be prevented by something simple you can share over
> e-mail such as a certain configuration setting?
> 
> Thanks,
> - Arkin
> 
> Exception stack is:
> 1. Stale NFS file handle (java.io.IOException)
>  java.io.RandomAccessFile:-2 (null)
> 2. Environment invalid because of previous exception: (JE 5.0.73)
> /app/logs/abs-workflow/flumeDir java.io.IOException: Stale NFS file handle
> LOG_READ: IOException on read, log is likely invalid. Environment is
> invalid and must be closed. fetchTarget of 0x542/0x4af13c parent IN=5 IN
> class=com.sleepycat.je.tree.BIN lastFullVersion=0x543/0x62d6c5
> lastLoggedVersion=0x543/0x62d6c5 parent.getDirty()=true state=0
> (com.sleepycat.je.EnvironmentFailureException)
>  com.sleepycat.je.log.FileManager:1883 (null)
> 
> ********************************************************************************
> Root Exception stack trace:java.io.IOException: Stale NFS file handle
>    at java.io.RandomAccessFile.readBytes(Native Method)
>    at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
>    at
> com.sleepycat.je.log.FileManager.readFromFileInternal(FileManager.java:1918)
>    at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1869)
>    at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1807)
>    at com.sleepycat.je.log.FileSource.getBytes(FileSource.java:56)
>    at
> com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:919)
>    at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848)
>    at
> com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:809)
>    at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)
>    at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)
>    at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2261)
>    at
> com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)
>    at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)
>    at
> com.sleepycat.je.cleaner.UtilizationProfile.getObsoleteDetail(UtilizationProfile.java:632)
>    at
> com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:439)
>    at
> com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:289)
>    at
> com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:148)
>    at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:163)
>    at java.lang.Thread.run(Thread.java:662)
> ********************************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org