You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by mitu2009 <mu...@gmail.com> on 2009/06/19 06:10:42 UTC

Synchronizing Lucene indexes across 2 application servers

I've a web application which uses Lucene for search functionality. Lucene
search requests are served by web services sitting on 2 application servers
(IIS 7).The 2 application servers are Load balanced using "netscaler". 

Both these servers have a batch job running which updates search indexes on
the respective servers in the night on a daily basis. 

I need to synchronize search indexes on these 2 servers so that at any point
of time both the servers have uptodate indexes. I was thinking what could be
the best architecture/design strategy to do so given the fact that any of
the 2 application servers could be serving search request depending upon its
availability. 

Any inputs please? 

Thanks for reading!

-- 
View this message in context: http://www.nabble.com/Synchronizing-Lucene-indexes-across-2-application-servers-tp24105223p24105223.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Synchronizing Lucene indexes across 2 application servers

Posted by Ian Lea <ia...@gmail.com>.
Or have a third master index, as Joel suggests, apply all updates to that
index, only, then at the end of each batch index update run, use rsync or
equivalent to push the master index out to the 2 search servers and then
tell them to reopen their indexes.


--
Ian.


On Fri, Jun 19, 2009 at 9:23 AM, Joel Halbert <jo...@su3analytics.com> wrote:

> do they have to be kept in synch in real time?
> does each server handle writes to its own index which then need to be
> propagated to the other server's index?
>
> From a simplicity point of view, to minimise the amount of self consistency
> checking that needs to happen I would suggest even having a third, master
> index, to which all writes happen. As writes are applied to the master they
> are then propagated to the 2 servers. You then just need to keep a track of
> the latest document written to each of the two "slave" servers, and in
> vcase
> of failure/recovery on either you just request all deltas since the last
> known record on each.
>
> On Friday 19 June 2009 05:10:42 mitu2009 wrote:
> > I've a web application which uses Lucene for search functionality. Lucene
> > search requests are served by web services sitting on 2 application
> servers
> > (IIS 7).The 2 application servers are Load balanced using "netscaler".
> >
> > Both these servers have a batch job running which updates search indexes
> on
> > the respective servers in the night on a daily basis.
> >
> > I need to synchronize search indexes on these 2 servers so that at any
> > point of time both the servers have uptodate indexes. I was thinking what
> > could be the best architecture/design strategy to do so given the fact
> that
> > any of the 2 application servers could be serving search request
> depending
> > upon its availability.
> >
> > Any inputs please?
> >
> > Thanks for reading!
>
>
>
> --
> Joel Halbert
> 020 3051 8637
> 075 2501 0825
> joel@su3analytics.com
> www.su3analytics.com
> www.storequery.com
> SU3 Analytics Ltd, The Print House, 18 Ashwin St, London E8 3DL.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Synchronizing Lucene indexes across 2 application servers

Posted by Joel Halbert <jo...@su3analytics.com>.
do they have to be kept in synch in real time?
does each server handle writes to its own index which then need to be 
propagated to the other server's index?

From a simplicity point of view, to minimise the amount of self consistency 
checking that needs to happen I would suggest even having a third, master 
index, to which all writes happen. As writes are applied to the master they 
are then propagated to the 2 servers. You then just need to keep a track of 
the latest document written to each of the two "slave" servers, and in vcase 
of failure/recovery on either you just request all deltas since the last 
known record on each.

On Friday 19 June 2009 05:10:42 mitu2009 wrote:
> I've a web application which uses Lucene for search functionality. Lucene
> search requests are served by web services sitting on 2 application servers
> (IIS 7).The 2 application servers are Load balanced using "netscaler".
>
> Both these servers have a batch job running which updates search indexes on
> the respective servers in the night on a daily basis.
>
> I need to synchronize search indexes on these 2 servers so that at any
> point of time both the servers have uptodate indexes. I was thinking what
> could be the best architecture/design strategy to do so given the fact that
> any of the 2 application servers could be serving search request depending
> upon its availability.
>
> Any inputs please?
>
> Thanks for reading!



-- 
Joel Halbert
020 3051 8637
075 2501 0825
joel@su3analytics.com
www.su3analytics.com
www.storequery.com
SU3 Analytics Ltd, The Print House, 18 Ashwin St, London E8 3DL.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Synchronizing Lucene indexes across 2 application servers

Posted by Ken Krugler <kk...@transpac.com>.
>I've a web application which uses Lucene for search functionality. Lucene
>search requests are served by web services sitting on 2 application servers
>(IIS 7).The 2 application servers are Load balanced using "netscaler".
>
>Both these servers have a batch job running which updates search indexes on
>the respective servers in the night on a daily basis.
>
>I need to synchronize search indexes on these 2 servers so that at any point
>of time both the servers have uptodate indexes. I was thinking what could be
>the best architecture/design strategy to do so given the fact that any of
>the 2 application servers could be serving search request depending upon its
>availability.

You could use Katta for this, as another option - it's an open source 
distributed Lucene search system.

Under the hood Katta uses ZooKeeper to handle distribution of data to 
multiple servers. Once Katta has added an index to both systems, then 
you can switch to it (and eventually remove the old index).

The fact that you'd need two Katta "masters" makes things a bit more 
interesting, as you'd have to coordinate when they both decide to 
switch to using the new index(es).

-- Ken
-- 
Ken Krugler
+1 530-210-6378

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Synchronizing Lucene indexes across 2 application servers

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

You may want to look at Lucene's younger brother named Solr: http://lucene.apache.org/solr/

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: mitu2009 <mu...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Friday, June 19, 2009 12:10:42 AM
> Subject: Synchronizing Lucene indexes across 2 application servers
> 
> 
> I've a web application which uses Lucene for search functionality. Lucene
> search requests are served by web services sitting on 2 application servers
> (IIS 7).The 2 application servers are Load balanced using "netscaler". 
> 
> Both these servers have a batch job running which updates search indexes on
> the respective servers in the night on a daily basis. 
> 
> I need to synchronize search indexes on these 2 servers so that at any point
> of time both the servers have uptodate indexes. I was thinking what could be
> the best architecture/design strategy to do so given the fact that any of
> the 2 application servers could be serving search request depending upon its
> availability. 
> 
> Any inputs please? 
> 
> Thanks for reading!
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Synchronizing-Lucene-indexes-across-2-application-servers-tp24105223p24105223.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org