You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by BrightMinds Dev <de...@brightminds.org> on 2011/02/08 19:23:34 UTC

HA Configuration / Best Practices

Hi,

We are developing a site with a 4 tier design (RP, UI, WS, DB) and on 
the WS tier are looking at how we would setup Lucene in a HA 
configuration i.e. so there is no single point of failure.  The initial 
deployment will involve pairs of servers at each tier.

As there are at least 2 servers at the WS (Lucene) tier that implies at 
least 2 indexes.

As far as best practices go:

1) What is the typical architecture for Lucene in a HA configuration?

2) How are indexes typically maintained in some sort of sync?  i.e. if a 
request comes in to do a search on the UI tier and returns a set of 
results and we want the next page of results but aren't using say 
stickiness if the indexes are out of sync this could be problematic.  
No?  How are these issues solved?

3) What types of things are done to the DB to keep track of updates e.g. 
having a last indexed timestamp is great but if you have 2 indexes are 
you adding 2 columns on each table?

4) Are there any white papers or references worth looking at?

FYI... the site is being designed to scale to millions of users and 
already incorporates Sharding for user data and related content.

Much Appreciated.

--Nikolaos

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: HA Configuration / Best Practices

Posted by Ian Lea <ia...@gmail.com>.

One way, not necessarily typical or best practice, but known to work,
is to designate one of your WS layer machines, or another server, as
the master indexer.  Run all index updates on that server and copy
indexes out to other server(s) using rsync.  That is normally quick
since it only takes changes.  Have something that notified slave
servers when they need to reopen their indexes so that searches don't
get out of synch.

Or look at solr.  I believe it takes care of most of this out of the box.


--
Ian.



On Tue, Feb 8, 2011 at 6:23 PM, BrightMinds Dev <de...@brightminds.org> wrote:
> Hi,
>
> We are developing a site with a 4 tier design (RP, UI, WS, DB) and on the WS
> tier are looking at how we would setup Lucene in a HA configuration i.e. so
> there is no single point of failure.  The initial deployment will involve
> pairs of servers at each tier.
>
> As there are at least 2 servers at the WS (Lucene) tier that implies at
> least 2 indexes.
>
> As far as best practices go:
>
> 1) What is the typical architecture for Lucene in a HA configuration?
>
> 2) How are indexes typically maintained in some sort of sync?  i.e. if a
> request comes in to do a search on the UI tier and returns a set of results
> and we want the next page of results but aren't using say stickiness if the
> indexes are out of sync this could be problematic.  No?  How are these
> issues solved?
>
> 3) What types of things are done to the DB to keep track of updates e.g.
> having a last indexed timestamp is great but if you have 2 indexes are you
> adding 2 columns on each table?
>
> 4) Are there any white papers or references worth looking at?
>
> FYI... the site is being designed to scale to millions of users and already
> incorporates Sharding for user data and related content.
>
> Much Appreciated.
>
> --Nikolaos
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org