You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kristian Ottosen <ko...@yawah.com> on 2005/04/02 10:29:30 UTC

Lucene on Linux problem...

Hello,

I have really been pulling out hair for a while over this. The problem only
occur using FSDirectory on some Linux system (some Debian, Suse and Redhat -
but not all) and never under Windows, Solaris or Mac OS X. 

What happens is that Lucene for some reason tries to read a segment which
was just merged into another segment and therefore deleted and fails with a
FileNotFoundException.
 
A few facts: Only a single thread is involved. If we use a RAM based Store
(RAMDirectory) there is no problem at all. Also there seems to a problem
related to locking using the lock files. We get timeouts even though there
is only a single thread working on the index. Switching the compound file
format on/off makes no difference. The problem may not appear immediately -
but only after several thousand documents have been indexed in a row using a
combination of IndexSearcher (check for duplicates) and IndexWriter (add
document) operations.

I wonder if there is general problem running Lucene on top of some of the
journaling file systems like ReiserFS or Ext3?

Any help or hints are greatly appreciated. Lucene is an excellent piece of
software.

Best Regards 
Kristian



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene on Linux problem...

Posted by Chris Lamprecht <cl...@gmail.com>.
I've also been getting this error very rarely on a production system
with lots of search queries.  The system is Fedora 1 using ext3.  It
happens in the call to IndexReader.getCurrentVersion(indexPath).  I
think it happens when the separate indexing process is running or has
just run (from a separate JVM).  Here is the stack trace:

Caused by: java.io.FileNotFoundException: /lucene/segments
(No such file or directory)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
       at
org.apache.lucene.store.FSInputStream$Descriptor.<init>(FSDirectory.java:376)
       at org.apache.lucene.store.FSInputStream.<init>(FSDirectory.java:405)
       at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
       at
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:106)
       at
org.apache.lucene.index.IndexReader.getCurrentVersion(IndexReader.java:214)
       at
org.apache.lucene.index.IndexReader.getCurrentVersion(IndexReader.java:200)
       at
org.apache.lucene.index.IndexReader.getCurrentVersion(IndexReader.java:187)
[...]

On Apr 2, 2005 2:29 AM, Kristian Ottosen <ko...@yawah.com> wrote:
> Hello,
> 
> I have really been pulling out hair for a while over this. The problem only
> occur using FSDirectory on some Linux system (some Debian, Suse and Redhat -
> but not all) and never under Windows, Solaris or Mac OS X.
> 
> What happens is that Lucene for some reason tries to read a segment which
> was just merged into another segment and therefore deleted and fails with a
> FileNotFoundException.
> 
> A few facts: Only a single thread is involved. If we use a RAM based Store
> (RAMDirectory) there is no problem at all. Also there seems to a problem
> related to locking using the lock files. We get timeouts even though there
> is only a single thread working on the index. Switching the compound file
> format on/off makes no difference. The problem may not appear immediately -
> but only after several thousand documents have been indexed in a row using a
> combination of IndexSearcher (check for duplicates) and IndexWriter (add
> document) operations.
> 
> I wonder if there is general problem running Lucene on top of some of the
> journaling file systems like ReiserFS or Ext3?
> 
> Any help or hints are greatly appreciated. Lucene is an excellent piece of
> software.
> 
> Best Regards
> Kristian
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene on Linux problem...

Posted by Paul Elschot <pa...@xs4all.nl>.
On Saturday 02 April 2005 10:29, Kristian Ottosen wrote:
> Hello,
> 
> I have really been pulling out hair for a while over this. The problem only
> occur using FSDirectory on some Linux system (some Debian, Suse and Redhat -
> but not all) and never under Windows, Solaris or Mac OS X. 
> 
> What happens is that Lucene for some reason tries to read a segment which
> was just merged into another segment and therefore deleted and fails with a
> FileNotFoundException.
>  
> A few facts: Only a single thread is involved. If we use a RAM based Store
> (RAMDirectory) there is no problem at all. Also there seems to a problem
> related to locking using the lock files. We get timeouts even though there
> is only a single thread working on the index. Switching the compound file
> format on/off makes no difference. The problem may not appear immediately -
> but only after several thousand documents have been indexed in a row using a
> combination of IndexSearcher (check for duplicates) and IndexWriter (add
> document) operations.
> 
> I wonder if there is general problem running Lucene on top of some of the
> journaling file systems like ReiserFS or Ext3?

The only reason I can think of for a FileNotFoundException is that the 
corresponding close() of the file has not terminated yet, or was not done
at all.

I don't know the precise file system semantics in relation to a single Linux 
process, but if these semantics allow for a file closed by a process to be
(temporarily) invisible to that same process, one could end up with
a FileNotFoundException.

It's the first time AFAIK something like this has been reported, so another
source of the problem could be your hard disk, but I would not expect a
journalling file system to ignore errors like this, ie. a journalling file
system should provide some indication of a error situation for a
hardware problem. Could you might check the system log for
file system messages?

Going back to the file system semantics allowing a temporarily invisible
file, you might change the Lucene java code to perform a file system sync
(via a shell) in case of the FileNotFoundException, and then retry opening
the file?
If that helps, it's the file system semantics. In case there is such a thing
as a file system sync in the java.io package, this would have to be added
to Lucene. Otherwise it would probably be useful to inform the maintainers
of the JVM.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene on Linux problem...

Posted by Kristian Ottosen <ko...@yawah.com>.
> I remember strange problems (with mkdir in my case) when using java
> servlets within a tomcat that was started by apache daemon.
> The problem occured on some linux installations only.

As a fact this is a servlet running under Tomcat - so you may have a good
point there. But I don't think it is started by an apache daemon. 

Will check it. Thanks
Kristian

> -----Original Message-----
> From: Morus Walter [mailto:morus.walter@tanto.de]
> Sent: 13. april 2005 08:38
> To: java-user@lucene.apache.org; ko@yawah.com
> Subject: RE: Lucene on Linux problem...
> 
> Kristian Ottosen writes:
> >
> > It seems to work now - but I would still love to see a good explanation.
> >
> How was your java application started? (sorry I you already explained,
> I didn't keep the thread and the mailing list archive seems to be out
> of order)
> I remember strange problems (with mkdir in my case) when using java
> servlets within a tomcat that was started by apache daemon.
> The problem occured on some linux installations only.
> The reason we found after quite some debugging was, that daemon wasn't
> linked to libpthread, so java was somehow using the wrong libc (or
> something
> similar, I never understood all details).
> (daemon started, loading the unthreaded libc and handed control to the
> java binary which continued using that libc while requireing a threaded)
> 
> I don't know if your problems have anything to do with such effects, but
> when you say, it happens on some distributions while others work, that
> comes into my mind.
> If the same jar file works on one linux installation and fails on another
> it's suggesting that the problem is on the jvm level or it's libraries.
> 
> Morus
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene on Linux problem...

Posted by Morus Walter <mo...@tanto.de>.
Kristian Ottosen writes:
> 
> It seems to work now - but I would still love to see a good explanation.
> 
How was your java application started? (sorry I you already explained, 
I didn't keep the thread and the mailing list archive seems to be out 
of order)
I remember strange problems (with mkdir in my case) when using java
servlets within a tomcat that was started by apache daemon.
The problem occured on some linux installations only.
The reason we found after quite some debugging was, that daemon wasn't 
linked to libpthread, so java was somehow using the wrong libc (or something
similar, I never understood all details).
(daemon started, loading the unthreaded libc and handed control to the
java binary which continued using that libc while requireing a threaded)

I don't know if your problems have anything to do with such effects, but
when you say, it happens on some distributions while others work, that
comes into my mind.
If the same jar file works on one linux installation and fails on another
it's suggesting that the problem is on the jvm level or it's libraries.

Morus


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene on Linux problem...

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi Guys

Apologies.........



We had something similar problems with Tomcat 5.0.3 on Linux Gentoo and
Lucene1.4.3...
every first Search was not able to work properly,

So finally we swithched JVM's from 1.4.2 to 1.4.3 sdk's and solved the
problem for the day....
We also updated the Linux Machines with Latest patch to solve the "stale"
problems.

May be swithcing the JVM's would solve some of the problems.




With regards
Karthik

-----Original Message-----
From: Kristian Ottosen [mailto:ko@yawah.com]
Sent: Tuesday, April 12, 2005 9:36 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene on Linux problem...


Hi,

Thank you for the comments and hints.

It seems like we finally solved this problem - but unfortunately without
being able to pinpoint the exact cause.

Our application did in fact follow all the Lucene concurrency rules. I
checked, double checked, bought "Lucene in Action" (that I highly recommend
by the way), read chapter 2.9 twice, triple checked and it just didn't do
anything illegal at all. Everything was closed correctly etc.

This was also backed by the fact that it worked absolutely perfect when
using a RAMDirectory instead of an FSDirectory on top of the Linux file.

Further the application also worked perfectly on all Windows, Mac OS X,
Solaris, AS/400 as well as almost all Linux system. But in a handful of
Linux cases (different distributions by the way) it simply behaved very,
very strange causing two possible symptoms:

1) After adding an amount of documents we sometimes got exceptions like this
one:

java.io.FileNotFoundException:
/srv/www/tomcat/demoserver/webapps/erez3/WEB-INF/private/index/_35.fnm
(No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
        at
org.apache.lucene.store.FSInputStream$Descriptor.<init>(FSDirectory.java:376
)
        at
org.apache.lucene.store.FSInputStream.<init>(FSDirectory.java:405)
        at
org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
        at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:53)
        at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:110)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
        at
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:122)
        at org.apache.lucene.store.Lock$With.run(Lock.java:109)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
        at
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:38)

while Lucene tried to read a segment which it had already deleted while
merging it with other segments.

2) And also very often:

java.io.IOException: Lock obtain timed out:
Lock@/usr/local/tomcat/current/temp/lucene-4e8e9515c104fa3aef765398cfa76815-
write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:58)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:223)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:213)

The frequency of this second exception was highly dependent on the value of
the writeLockTimeout. The table below shows the number of timeouts
encountered during a single run (add some thousand documents) for different
values of writeLockTimeout:

  20  =    1
 500  =    1
 995  =    1
 999  =    1
1000  = 3600
1100  = 3252
1500  = 3645
2000  =  116
3000  =   66
4000  =   57
10000 =   15
30000 =    7

Strange isn't it? Mind you that only a single thread is involved. No other
threads are reading or writing to the index. All files including the locks
are stored on a local file system (no NFS or anything).

What we did to finally work around the problem was to stop creating a new
IndexSearcher to check for duplicates before adding each document and
instead re-use it during batch updates. That required some deal of extra
care and management on the part of the application - but as a side effect
also improved performance a bit (I guess).

But it is still a puzzle to me why things didn't work on these Linux boxes
in the first place. It almost seems like the Linux file system sometimes got
"overwhelmed" by the frequent modifications and started returning "stale"
data and/or file status to the application.

It seems to work now - but I would still love to see a good explanation.

Best Regards from Denmark
Kristian Ottosen
http://www.yawah.com


> -----Original Message-----
> From: Miles Barr [mailto:miles@runtime-collective.com]
> Sent: 4. april 2005 10:25
> To: Lucene User
> Subject: Re: Lucene on Linux problem...
>
> On Sat, 2005-04-02 at 10:29 +0200, Kristian Ottosen wrote:
> > I wonder if there is general problem running Lucene on top of some of
> the
> > journaling file systems like ReiserFS or Ext3?
>
> I haven't had any problems running Lucene on either of those file
> systems. I've done updates to the index while people are still searching
> and it all worked fine.
>
> > The problem may not appear immediately - but only after several
> > thousand documents have been indexed in a row using a combination of
> > IndexSearcher (check for duplicates) and IndexWriter (add document)
> > operations.
>
> Are you deleting the duplicates as you find them? i.e. are you closing
> and opening reader/writers all the time? If you are, besides
> recommending switching to batch updates, I suggest being really thorough
> with closing everything, e.g.
>
> IndexReader reader = null;
> IndexWriter writer = null;
>
> try {
>     reader = IndexReader.open(/* Path to index */);
>
>     ... do the deletes ...
>
>     reader.close();
>     reader = null;
>
>     writer = new IndexWriter(index, /* Your analyzer */, false);
>
>     ... do the deletes ....
>
>     writer.optimize();
>     writer.close();
>     writer = null;
> }
> catch (IOException e) {
>     ... do any error processing ...
> }
> finally {
>     // We can do both these closes in the same finally block because if
>     // reader is not null then writer is null, so we don't need to worry
>     // about reader.close() throwing another IOException
>     if (reader != null) {
>         reader.close();
>         reader = null;
>     }
>
>     if (writer != null) {
>         writer.optimize();
>         writer.close();
>         writer = null;
>     }
> }
>
>
>
> --
> Miles Barr <mi...@runtime-collective.com>
> Runtime Collective Ltd.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene on Linux problem...

Posted by Kristian Ottosen <ko...@yawah.com>.
Hi,

Thank you for the comments and hints.

It seems like we finally solved this problem - but unfortunately without
being able to pinpoint the exact cause. 

Our application did in fact follow all the Lucene concurrency rules. I
checked, double checked, bought "Lucene in Action" (that I highly recommend
by the way), read chapter 2.9 twice, triple checked and it just didn't do
anything illegal at all. Everything was closed correctly etc.

This was also backed by the fact that it worked absolutely perfect when
using a RAMDirectory instead of an FSDirectory on top of the Linux file.

Further the application also worked perfectly on all Windows, Mac OS X,
Solaris, AS/400 as well as almost all Linux system. But in a handful of
Linux cases (different distributions by the way) it simply behaved very,
very strange causing two possible symptoms:

1) After adding an amount of documents we sometimes got exceptions like this
one:

java.io.FileNotFoundException:
/srv/www/tomcat/demoserver/webapps/erez3/WEB-INF/private/index/_35.fnm
(No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
        at
org.apache.lucene.store.FSInputStream$Descriptor.<init>(FSDirectory.java:376
)
        at
org.apache.lucene.store.FSInputStream.<init>(FSDirectory.java:405)
        at
org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
        at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:53)
        at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:110)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
        at
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:122)
        at org.apache.lucene.store.Lock$With.run(Lock.java:109)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
        at
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:38)
 
while Lucene tried to read a segment which it had already deleted while
merging it with other segments.

2) And also very often:

java.io.IOException: Lock obtain timed out:
Lock@/usr/local/tomcat/current/temp/lucene-4e8e9515c104fa3aef765398cfa76815-
write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:58)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:223)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:213)

The frequency of this second exception was highly dependent on the value of
the writeLockTimeout. The table below shows the number of timeouts
encountered during a single run (add some thousand documents) for different
values of writeLockTimeout:

  20  =    1
 500  =    1
 995  =    1
 999  =    1 
1000  = 3600
1100  = 3252
1500  = 3645 
2000  =  116 
3000  =   66 
4000  =   57 
10000 =   15 
30000 =    7 

Strange isn't it? Mind you that only a single thread is involved. No other
threads are reading or writing to the index. All files including the locks
are stored on a local file system (no NFS or anything).

What we did to finally work around the problem was to stop creating a new
IndexSearcher to check for duplicates before adding each document and
instead re-use it during batch updates. That required some deal of extra
care and management on the part of the application - but as a side effect
also improved performance a bit (I guess).

But it is still a puzzle to me why things didn't work on these Linux boxes
in the first place. It almost seems like the Linux file system sometimes got
"overwhelmed" by the frequent modifications and started returning "stale"
data and/or file status to the application. 

It seems to work now - but I would still love to see a good explanation.

Best Regards from Denmark
Kristian Ottosen
http://www.yawah.com


> -----Original Message-----
> From: Miles Barr [mailto:miles@runtime-collective.com]
> Sent: 4. april 2005 10:25
> To: Lucene User
> Subject: Re: Lucene on Linux problem...
> 
> On Sat, 2005-04-02 at 10:29 +0200, Kristian Ottosen wrote:
> > I wonder if there is general problem running Lucene on top of some of
> the
> > journaling file systems like ReiserFS or Ext3?
> 
> I haven't had any problems running Lucene on either of those file
> systems. I've done updates to the index while people are still searching
> and it all worked fine.
> 
> > The problem may not appear immediately - but only after several
> > thousand documents have been indexed in a row using a combination of
> > IndexSearcher (check for duplicates) and IndexWriter (add document)
> > operations.
> 
> Are you deleting the duplicates as you find them? i.e. are you closing
> and opening reader/writers all the time? If you are, besides
> recommending switching to batch updates, I suggest being really thorough
> with closing everything, e.g.
> 
> IndexReader reader = null;
> IndexWriter writer = null;
> 
> try {
>     reader = IndexReader.open(/* Path to index */);
> 
>     ... do the deletes ...
> 
>     reader.close();
>     reader = null;
> 
>     writer = new IndexWriter(index, /* Your analyzer */, false);
> 
>     ... do the deletes ....
> 
>     writer.optimize();
>     writer.close();
>     writer = null;
> }
> catch (IOException e) {
>     ... do any error processing ...
> }
> finally {
>     // We can do both these closes in the same finally block because if
>     // reader is not null then writer is null, so we don't need to worry
>     // about reader.close() throwing another IOException
>     if (reader != null) {
>         reader.close();
>         reader = null;
>     }
> 
>     if (writer != null) {
>         writer.optimize();
>         writer.close();
>         writer = null;
>     }
> }
> 
> 
> 
> --
> Miles Barr <mi...@runtime-collective.com>
> Runtime Collective Ltd.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene on Linux problem...

Posted by Miles Barr <mi...@runtime-collective.com>.
On Sat, 2005-04-02 at 10:29 +0200, Kristian Ottosen wrote:
> I wonder if there is general problem running Lucene on top of some of the
> journaling file systems like ReiserFS or Ext3?

I haven't had any problems running Lucene on either of those file
systems. I've done updates to the index while people are still searching
and it all worked fine.

> The problem may not appear immediately - but only after several
> thousand documents have been indexed in a row using a combination of
> IndexSearcher (check for duplicates) and IndexWriter (add document)
> operations.

Are you deleting the duplicates as you find them? i.e. are you closing
and opening reader/writers all the time? If you are, besides
recommending switching to batch updates, I suggest being really thorough
with closing everything, e.g.

IndexReader reader = null;
IndexWriter writer = null;

try {
    reader = IndexReader.open(/* Path to index */);

    ... do the deletes ...

    reader.close();
    reader = null;

    writer = new IndexWriter(index, /* Your analyzer */, false);

    ... do the deletes ....

    writer.optimize();
    writer.close();
    writer = null;
}
catch (IOException e) {
    ... do any error processing ...
}
finally {
    // We can do both these closes in the same finally block because if
    // reader is not null then writer is null, so we don't need to worry
    // about reader.close() throwing another IOException
    if (reader != null) {
        reader.close();
        reader = null;
    }

    if (writer != null) {
        writer.optimize();
        writer.close();
        writer = null;
    }
}



-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org