You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Morus Walter <mo...@tanto-xipolis.de> on 2003/07/30 16:11:14 UTC

Lucene Index on NFS Server

Hi,

I'm currently planing a web application using lucene for search.

There will be two web server maschines responable for the application
and the searches. Two maschines basically to be failsafe, load is not
expected to be a problem initially, though this might change over time.
So scaling is a minor concern but not completely irrelevant.

Now the question is, where to put the lucene index:
a) each web server might hold a copy of the lucene index
   In this case the index might be updated on a third copy and then
   copied to the web servers (e.g. once an hour)   or   each web 
   server updates it's copy of the index independently.
b) the lucene index is put on a NFS server, hosted by a third machine.

The index is modified often, the expected index size is a few 10000 
documents, where each document has some 2 kByte text.
The number of documents might grow over time, but it's not expected
to exceed a few 100000 documents in the foreseeable future.

>From these numbers I expect an index that could be cached by the search
machines, if the index is on a NFS server. So NFS performance issues
should be mitigated.

Are there any problems with lucene related to such an setup?
Can I have a lucene index on a NFS filesystem without problems
(access is readonly)?
 
What setup would you prefer?

Any input apreciated.

greetings
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Lucene Index on NFS Server

Posted by Jan Agermose <ja...@agermose.dk>.
I just think that if You can make ONE part of the system
hardware-failure-safe you really do not need all this - its sounds like a
very expensive setup and you might be able to configure a one server system
to be as-hardware-failure-safe if you used all the mony on this one
computer. Raid 5, dual power and so on... (all very expensive :-) )

Im no expert on HA-systems, but http://linux-ha.org/ is a very good staring
point for looking at FS and stuff.

JMS. If you set up a publish/subscribe system you could publish the
insert/delete/update command and be SURE that if you can deliver the
published message ALL the subscribers will eventual get the message and act
upon it - even messages that are "delivered" during one webservers
downperiod. And this scales if you add more than two servers. If you
implement this using http requests, YOU need to handle storing messages
during downperiods - recovery, and if you move to a 3-server setup insted of
a 2-server setup...

http://java.sun.com/products/jms/tutorial/1_3_1-fcs/doc/basics.html#1023551

Jan Agermose


----- Original Message ----- 
From: "Morus Walter" <mo...@tanto-xipolis.de>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, July 31, 2003 9:53 AM
Subject: Re: Lucene Index on NFS Server


> Hi Jan,
>
> thanks for your answer.
>
> > What part of the webserver are you expecting that will fail? The service
or
> > the computer? Why would the computer hosting NFS be less likely to fail
than
> > your computer hosting the webserver?
> >
> The computer.
> Of course you're right with the nfs server. That's one drawback, but the
> idea is to have a RAID system, that might be switched to another maschine,
> if the first maschine fails. If the RAID system has enough internal
> redundancy it's failure should be reasonable improbable.
> It remains a single point of failure though.
>
> This is not about a very high availablity solution.
> We just want to have a bit more than just relying on one maschine.
> Scenarios like manual switches, if a machine fails, are ok.
>
> > You could use JMS to communicate updates to the to webservers? Or use a
>
> So far I thought about simple http calls, to send the import/delete
> requests to the webservers. They are servlet servers anyway.
> What improvements would you expect from using JMS?
>
> > distributed FS on the to computers hosting the webservers (and not using
a
> > 3. computer)?
> >
> That's an interesting idea. Unfortunately we don't have any experience
> with such a setup. Any suggestions for intel/linux?
>
> And how reliable is such a solution with respect to consistency of the
> lucene index. I mean in this szenario one of the webservers would do the
> import. Would it be safe to simply reopen searchers on the other
webserver?
> Basically that's the same question as for the nfs server.
>
> greetings
> Morus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Lucene Index on NFS Server

Posted by Morus Walter <mo...@tanto-xipolis.de>.
Hi Jan,

thanks for your answer.

> What part of the webserver are you expecting that will fail? The service or
> the computer? Why would the computer hosting NFS be less likely to fail than
> your computer hosting the webserver?
> 
The computer.
Of course you're right with the nfs server. That's one drawback, but the 
idea is to have a RAID system, that might be switched to another maschine, 
if the first maschine fails. If the RAID system has enough internal 
redundancy it's failure should be reasonable improbable. 
It remains a single point of failure though.

This is not about a very high availablity solution.
We just want to have a bit more than just relying on one maschine.
Scenarios like manual switches, if a machine fails, are ok.

> You could use JMS to communicate updates to the to webservers? Or use a

So far I thought about simple http calls, to send the import/delete
requests to the webservers. They are servlet servers anyway.
What improvements would you expect from using JMS?

> distributed FS on the to computers hosting the webservers (and not using a
> 3. computer)?
> 
That's an interesting idea. Unfortunately we don't have any experience
with such a setup. Any suggestions for intel/linux?

And how reliable is such a solution with respect to consistency of the
lucene index. I mean in this szenario one of the webservers would do the
import. Would it be safe to simply reopen searchers on the other webserver?
Basically that's the same question as for the nfs server.

greetings
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Lucene Index on NFS Server

Posted by Jan Agermose <ja...@agermose.dk>.
What part of the webserver are you expecting that will fail? The service or
the computer? Why would the computer hosting NFS be less likely to fail than
your computer hosting the webserver?

You could use JMS to communicate updates to the to webservers? Or use a
distributed FS on the to computers hosting the webservers (and not using a
3. computer)?

Jan


----- Original Message ----- 
From: "Morus Walter" <mo...@tanto-xipolis.de>
To: <lu...@jakarta.apache.org>
Sent: Wednesday, July 30, 2003 4:11 PM
Subject: Lucene Index on NFS Server


> Hi,
>
> I'm currently planing a web application using lucene for search.
>
> There will be two web server maschines responable for the application
> and the searches. Two maschines basically to be failsafe, load is not
> expected to be a problem initially, though this might change over time.
> So scaling is a minor concern but not completely irrelevant.
>
> Now the question is, where to put the lucene index:
> a) each web server might hold a copy of the lucene index
>    In this case the index might be updated on a third copy and then
>    copied to the web servers (e.g. once an hour)   or   each web
>    server updates it's copy of the index independently.
> b) the lucene index is put on a NFS server, hosted by a third machine.
>
> The index is modified often, the expected index size is a few 10000
> documents, where each document has some 2 kByte text.
> The number of documents might grow over time, but it's not expected
> to exceed a few 100000 documents in the foreseeable future.
>
> From these numbers I expect an index that could be cached by the search
> machines, if the index is on a NFS server. So NFS performance issues
> should be mitigated.
>
> Are there any problems with lucene related to such an setup?
> Can I have a lucene index on a NFS filesystem without problems
> (access is readonly)?
>
> What setup would you prefer?
>
> Any input apreciated.
>
> greetings
> Morus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Lucene Index on NFS Server

Posted by Doug Cutting <cu...@lucene.com>.
I don't know the details of how lock files are unreliable over NFS, only 
that they are.  The window of vulnerability, when the lock file is used, 
is when one JVM is opening all of the files in an index, and another is 
completing an update at the same time.  If the updating machine removes 
some files after the opening machine has read the 'segments' file but 
before it has opened all of the files, then the open will fail with a 
FileNotFound exception.  If your application can guarantee that indexes 
are not opened while an update is completing (under IndexWriter.close() 
or IndexReader.close() for deletions) then this will not be a problem.

Doug

Morus Walter wrote:
> Doug Cutting writes:
> 
> 
>>>Can I have a lucene index on a NFS filesystem without problems
>>>(access is readonly)?
>>
>>So long as all access is read-only, there should not be a problem.  Keep 
>>in mind however that lock files are known to not work correctly over NFS.
>>
> 
> Hmm. Sorry, I was a bit unprecise (at least in the quoted part) so I'm 
> not sure, if I got that correctly. Access over NFS is readonly, but 
> there would be write access on the NFS server itself (local filesystem).
> Is this ok? Or should I use a "update a copy of the index and exchange
> indexes afterwards" strategy?
> 
> TIA
> 	Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


Re: Lucene Index on NFS Server

Posted by Doug Cutting <cu...@lucene.com>.
I don't know the details of how lock files are unreliable over NFS, only 
that they are.  The window of vulnerability, when the lock file is used, 
is when one JVM is opening all of the files in an index, and another is 
completing an update at the same time.  If the updating machine removes 
some files after the opening machine has read the 'segments' file but 
before it has opened all of the files, then the open will fail with a 
FileNotFound exception.  If your application can guarantee that indexes 
are not opened while an update is completing (under IndexWriter.close() 
or IndexReader.close() for deletions) then this will not be a problem.

Doug

Morus Walter wrote:
> Doug Cutting writes:
> 
> 
>>>Can I have a lucene index on a NFS filesystem without problems
>>>(access is readonly)?
>>
>>So long as all access is read-only, there should not be a problem.  Keep 
>>in mind however that lock files are known to not work correctly over NFS.
>>
> 
> Hmm. Sorry, I was a bit unprecise (at least in the quoted part) so I'm 
> not sure, if I got that correctly. Access over NFS is readonly, but 
> there would be write access on the NFS server itself (local filesystem).
> Is this ok? Or should I use a "update a copy of the index and exchange
> indexes afterwards" strategy?
> 
> TIA
> 	Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Lucene Index on NFS Server

Posted by Morus Walter <mo...@tanto-xipolis.de>.
Doug Cutting writes:

> > Can I have a lucene index on a NFS filesystem without problems
> > (access is readonly)?
> 
> So long as all access is read-only, there should not be a problem.  Keep 
> in mind however that lock files are known to not work correctly over NFS.
> 
Hmm. Sorry, I was a bit unprecise (at least in the quoted part) so I'm 
not sure, if I got that correctly. Access over NFS is readonly, but 
there would be write access on the NFS server itself (local filesystem).
Is this ok? Or should I use a "update a copy of the index and exchange
indexes afterwards" strategy?

TIA
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Lucene Index on NFS Server

Posted by Morus Walter <mo...@tanto-xipolis.de>.
Doug Cutting writes:

> > Can I have a lucene index on a NFS filesystem without problems
> > (access is readonly)?
> 
> So long as all access is read-only, there should not be a problem.  Keep 
> in mind however that lock files are known to not work correctly over NFS.
> 
Hmm. Sorry, I was a bit unprecise (at least in the quoted part) so I'm 
not sure, if I got that correctly. Access over NFS is readonly, but 
there would be write access on the NFS server itself (local filesystem).
Is this ok? Or should I use a "update a copy of the index and exchange
indexes afterwards" strategy?

TIA
	Morus

Re: Lucene Index on NFS Server

Posted by Doug Cutting <cu...@lucene.com>.
Morus Walter wrote:
> Can I have a lucene index on a NFS filesystem without problems
> (access is readonly)?

So long as all access is read-only, there should not be a problem.  Keep 
in mind however that lock files are known to not work correctly over NFS.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org