You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by David Thompson <gu...@yahoo.com> on 2010/06/30 22:16:35 UTC

Multiple Solr servers and a shared index vs master+slaves

I'm a newbie looking at setting up an intranet search service using Solr, so I'm having a hard time understanding why I should forego the high availability and clustering mechanisms we already have available, and use Solr's implementations instead. I'm hoping some experienced Solr architects could take the time to comment.

Our corporate standard is for any java web app to be deployed as an ear file targeted to a 4-server Weblogic 10.3 cluster on virtual Solaris boxes, operating behind a cluster of Apache web servers. All servers have NFS mounts to high availability SANs. So my Solr proof-of-concept tries to make use of those tools. I've deployed Solr to the cluster, and all of them use the same solr.home on the NFS mount. This seems to be just fine for searching, query requests are evenly distributed across the cluster, and search performance seems to be fine with the index living on the NFS mount.

The problems, of course, start when add/update requests come in. This setup is the equivalent of having 4 standalone Solr servers using the same index. So if I use the "simple" lock file mechanism, in my testing so far it seems to keep them all separate just fine, except that the first update comes in to serverA, it grabs the write lock, then if any other servers receive an update near the same time, it must wait for the write lock to be be removed by serverA after it commits. I think I can pretty well mitigate this by directing all updates through a single server (via virtual IP address), but then I need the other servers to realize the index has changed after each commit. It looks like I can make a call like http://serverB/solr/update/extract?commit=true and that's good enough to get it to open a new reader, but that seems a little clunky. I've read in another thread about the use of "commit hooks" that can trigger user-defined events, I think, so
I'm looking into that now.

Now when I look at using Solr's master+slaves architecture, I feel like it's duplicating the trusted (and expensive) services we already have at our disposal. Weblogic+Apache clusters do a good job of distributing load, monitoring health, failing-over, restarting, etc. And if we used slaves that pulled index snapshots, they'd be using (by policy) the same NFS mount to store those snapshots, so we'd be pulling it over the wire only to write it right next to the original index. If we didn't have these HA clustering mechanisms available already, then I'm sure I'd be much more willing to look at a Solr master+slave architecture. But since we do, it seems like I'm a little bit hamstrung to use Solr's mechanisms anyway. So, that's my scenario, comments welcome. :)

-dKt

Re: Multiple Solr servers and a shared index vs master+slaves

Posted by Lance Norskog <go...@gmail.com>.

Yes, the Java replication system is not designed for your case. Using
the LockFactory system with one writer is not how Solr is generally
QA'd, but it should work if the locking system is perfect.

But, is it possible to use hard links? Each Solr would have its own
data/index directory. Every index update would remove the links from
the readers' data/index and then hard link the contents of the
writer's data/index. This gives you a way around relying on the
locking system.

I have done the hard link trick, and I prefer to make the reader's
links read-only.

(And of course you would also do this with the spellcheck indexes if
you use them.)

On Wed, Jun 30, 2010 at 1:16 PM, David Thompson <gu...@yahoo.com> wrote:
> I'm a newbie looking at setting up an intranet search service using Solr, so I'm having a hard time understanding why I should forego the high availability and clustering mechanisms we already have available, and use Solr's implementations instead.  I'm hoping some experienced Solr architects could take the time to comment.
>
> Our corporate standard is for any java web app to be deployed as an ear file targeted to a 4-server Weblogic 10.3 cluster on virtual Solaris boxes, operating behind a cluster of Apache web servers.  All servers have NFS mounts to high availability SANs.  So my Solr proof-of-concept tries to make use of those tools.  I've deployed Solr to the cluster, and all of them use the same solr.home on the NFS mount.  This seems to be just fine for searching, query requests are evenly distributed across the cluster, and search performance seems to be fine with the index living on the NFS mount.
>
> The problems, of course, start when add/update requests come in.  This setup is the equivalent of having 4 standalone Solr servers using the same index.  So if I use the "simple" lock file mechanism, in my testing so far it seems to keep them all separate just fine, except that the first update comes in to serverA, it grabs the write lock, then if any other servers receive an update near the same time, it must wait for the write lock to be be removed by serverA after it commits.  I think I can pretty well mitigate this by directing all updates through a single server (via virtual IP address), but then I need the other servers to realize the index has changed after each commit.  It looks like I can make a call like http://serverB/solr/update/extract?commit=true and that's good enough to get it to open a new reader, but that seems a little clunky.  I've read in another thread about the use of "commit hooks" that can trigger user-defined events, I think, so
>  I'm looking into that now.
>
> Now when I look at using Solr's master+slaves architecture, I feel like it's duplicating the trusted (and expensive) services we already have at our disposal.  Weblogic+Apache clusters do a good job of distributing load, monitoring health, failing-over, restarting, etc.  And if we used slaves that pulled index snapshots, they'd be using (by policy) the same NFS mount to store those snapshots, so we'd be pulling it over the wire only to write it right next to the original index.  If we didn't have these HA clustering mechanisms available already, then I'm sure I'd be much more willing to look at a Solr master+slave architecture.  But since we do, it seems like I'm a little bit hamstrung to use Solr's mechanisms anyway.  So, that's my scenario, comments welcome.  :)
>
>  -dKt
>
>
>
>



-- 
Lance Norskog
goksron@gmail.com