You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Sam Lee <sa...@yahoo.com.INVALID> on 2022/03/13 15:23:10 UTC

How to run Solr on two servers for redundancy

How do I run Apache Solr on two servers such that I will still be able
to index and query even if one of the servers is taken offline? In other
words, I am looking for high availability with automatic failover.

I have looked at SolrCloud. However, it uses Apache ZooKeeper which
requires at least 3 servers to be able to tolerate the failure of 1
server. Are there alternative methods?

I only have two servers that are able to run Solr. Supposing I have a
third low-spec server, is it a good idea to run Solr (SolrCloud) on only
two of the three servers, while running ZooKeeper on all three servers?
However, this looks like a waste because the third server would
essentially be doing nothing. Is there a better method with standalone
Solr?

Re: How to run Solr on two servers for redundancy

Posted by Sam Lee <sa...@yahoo.com.INVALID>.

On 2022/03/13 21:33:55 Dave wrote:
> You’re on the right idea, in my opinion. Three identical “slave”
> servers with one “master” ...

Thank you for the suggestion. I have a few questions:

* Are you suggesting to use standalone Solr instead of SolrCloud?
* Why does this setup require 4 servers (1 master + 3 slaves)?
  Note that I only have two servers (+ 1 low-spec server).

> ... with an nginx server on each one “slave” witth the servers
> augmented.

* How does Nginx come into the picture? What is it used for?

> N1 has a 2—>s3 n2 has n3->s2 n3 has s3->s1 and all three finally fall
> to master.  You can get 5 9’s like this.
>
> Pro tip keep all action on one until it falls, and never use over 31
> fb heap size
>
> Just is just a trial and error and complete success option snd no need
> of complications with zk
> -Dave

Your idea appears to be a promising one. It's just that I don't
completely understand it yet.

Thank you.

Re: How to run Solr on two servers for redundancy

Posted by Dave <ha...@gmail.com>.

You’re on the right idea, in my opinion. Three identical “slave”servers with one “master” with an nginx server on each one “slave” witth the servers augmented. N1 has a 2—>s3 n2 has n3->s2 n3 has s3->s1 and all three finally fall to master.  You can get 5 9’s like this. 

Pro tip keep all action on one until it falls, and never use over 31 fb heap size

Just is just a trial and error and complete success option snd no need of complications with zk 
-Dave

> On Mar 13, 2022, at 3:48 PM, Sam Lee <sa...@yahoo.com.invalid> wrote:
> 
> How do I run Apache Solr on two servers such that I will still be able
> to index and query even if one of the servers is taken offline? In other
> words, I am looking for high availability with automatic failover.
> 
> I have looked at SolrCloud. However, it uses Apache ZooKeeper which
> requires at least 3 servers to be able to tolerate the failure of 1
> server. Are there alternative methods?
> 
> I only have two servers that are able to run Solr. Supposing I have a
> third low-spec server, is it a good idea to run Solr (SolrCloud) on only
> two of the three servers, while running ZooKeeper on all three servers?
> However, this looks like a waste because the third server would
> essentially be doing nothing. Is there a better method with standalone
> Solr?

Re: How to run Solr on two servers for redundancy

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/13/22 23:28, Sam Lee wrote:
> By "standalone client", do you mean that I could use SolrJ on a separate
> server where no Solr instance is running? i.e. use the client to
> remotely connect to SolrCloud.

SolrJ is an inherent part of Solr.  But it is also a complete library by 
itself, which Java programmers can use to add Solr support to their 
programs.

> By the way, the most popular Python client, pysolr, seems to support
> SolrCloud mode. [1]

Be aware that all clients other than SolrJ are third-party -- not 
produced or maintained by the Solr project.  The pysolr client may be 
the most popular Python client ... I couldn't say because it was made by 
somebody else, not this project.  Nice that they support zookeeper 
connections ... I wasn't aware of that.

Thanks,
Shawn

Re: How to run Solr on two servers for redundancy

Posted by Joe Mocker <jm...@magnite.com.INVALID>.

This is what we do…

We have a primary and backup indexer, and a fleet of repeaters. We have a method to detect if the primary indexer has gone down and direct the repeaters to the backup indexer.

  —joe

> On Mar 15, 2022, at 7:12 AM, Eric Pugh <ep...@opensourceconnections.com> wrote:
> 
> I am proposing Standalone Solr ;-)
> 
> You are quite right that if the indexer goes offline, then you wouldn’t see updates in your two separate Solrs….    However, assuming you aren’t in a near real time situation where your application is broken if the updates aren’t happening, then you would still be able to serve up search traffic.
> 
> If you are really worried about the indexer going offline, then just have two of them as well ;-).   Depending on your load, you could just run two indexers, one on each Solr as well.
> 
> Using SolrCloud wouldn’t help you on High Availability of the indexer ;-)
> 
> Eric
> 
> 
>> On Mar 15, 2022, at 4:26 AM, Sam Lee <sa...@yahoo.com> wrote:
>> 
>> On 2022/03/14 12:19:10 Eric Pugh wrote:
>>> Let me propose a slightly different approach ;-)
>>> 
>>> Since you don’t need Solrcloud to support scaling needs, but instead
>>> for redundancy, then I like to set things up where my indexer just
>>> sends the updates to TWO SEPARATE single server Solr nodes.
>> 
>> Are you suggesting to use Standalone Solr instead of SolrCloud?
>> 
>> If I am understanding this correctly, you are suggesting to use three
>> servers: one for indexing, and two for clients to query. Wouldn't
>> there be downtime if the indexer goes offline?
>> 
>> Thank you.
> 
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
>

Re: How to run Solr on two servers for redundancy

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/15/2022 8:12 AM, Eric Pugh wrote:
> Using SolrCloud wouldn’t help you on High Availability of the indexer ;-)

If there is a load balancer in the mix or all clients are zk aware, then 
I think SolrCloud does offer high availability for indexing.

Thanks,
Shawn

Re: How to run Solr on two servers for redundancy

Posted by Sam Lee <sa...@yahoo.com.INVALID>.

On 2022-03-15 10:12 -0400, Eric Pugh wrote:
> I am proposing Standalone Solr ;-)
>
> You are quite right that if the indexer goes offline, then you
> wouldn’t see updates in your two separate Solrs….    However, assuming
> you aren’t in a near real time situation where your application is
> broken if the updates aren’t happening, then you would still be able
> to serve up search traffic.
>
> If you are really worried about the indexer going offline, then just
> have two of them as well ;-).   Depending on your load, you could just
> run two indexers, one on each Solr as well.

I see. So you are proposing to use the set up shown in the diagram here:
https://solr.apache.org/guide/8_11/index-replication.html

> Using SolrCloud wouldn’t help you on High Availability of the indexer ;-)

I thought that SolrCloud is supposed to be a solution for achieving high
availability. Is my understanding incorrect?

Re: How to run Solr on two servers for redundancy

Posted by Eric Pugh <ep...@opensourceconnections.com>.

I am proposing Standalone Solr ;-)

You are quite right that if the indexer goes offline, then you wouldn’t see updates in your two separate Solrs….    However, assuming you aren’t in a near real time situation where your application is broken if the updates aren’t happening, then you would still be able to serve up search traffic.

If you are really worried about the indexer going offline, then just have two of them as well ;-).   Depending on your load, you could just run two indexers, one on each Solr as well.

Using SolrCloud wouldn’t help you on High Availability of the indexer ;-)

Eric

> On Mar 15, 2022, at 4:26 AM, Sam Lee <sa...@yahoo.com> wrote:
> 
> On 2022/03/14 12:19:10 Eric Pugh wrote:
>> Let me propose a slightly different approach ;-)
>> 
>> Since you don’t need Solrcloud to support scaling needs, but instead
>> for redundancy, then I like to set things up where my indexer just
>> sends the updates to TWO SEPARATE single server Solr nodes.
> 
> Are you suggesting to use Standalone Solr instead of SolrCloud?
> 
> If I am understanding this correctly, you are suggesting to use three
> servers: one for indexing, and two for clients to query. Wouldn't
> there be downtime if the indexer goes offline?
> 
> Thank you.

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Re: How to run Solr on two servers for redundancy

Posted by Sam Lee <sa...@yahoo.com.INVALID>.

On 2022/03/14 12:19:10 Eric Pugh wrote:
> Let me propose a slightly different approach ;-)
>
> Since you don’t need Solrcloud to support scaling needs, but instead
> for redundancy, then I like to set things up where my indexer just
> sends the updates to TWO SEPARATE single server Solr nodes.

Are you suggesting to use Standalone Solr instead of SolrCloud?

If I am understanding this correctly, you are suggesting to use three
servers: one for indexing, and two for clients to query. Wouldn't
there be downtime if the indexer goes offline?

Thank you.

Re: How to run Solr on two servers for redundancy

Posted by Eric Pugh <ep...@opensourceconnections.com>.

Let me propose a slightly different approach ;-)

Since you don’t need Solrcloud to support scaling needs, but instead for redundancy, then I like to set things up where my indexer just sends the updates to TWO SEPARATE single server Solr nodes.  This is great for a number of reasons:

1) Green/Blue deployments.   I can upgrade one Solr and leave the other alone.
2) I can A/B test by deploying new relevance configs to one Solr and then compare results to the other.
3) If I am in the cloud, well I can drop one Solr on AWS and the other on GCP or another cloud provider.

Eric

> On Mar 14, 2022, at 1:28 AM, Sam Lee <sa...@yahoo.com.INVALID> wrote:
> 
> On 2022/03/13 22:22:48 Shawn Heisey wrote:
>> Zookeeper has fairly low system requirements compared to Solr, so using
>> a third machine with lower specs to just run the tie-breaker ZK is a
>> good way to go.
>> 
>> Note that you'll only have full redundancy at the client level with that
>> setup if your client is ZK-aware.  The only Solr client I know about
>> that's ZK aware is the Java client, which is part of Solr itself as well
>> as being a standalone client.
> 
> Thank you for bringing this potential issue to my attention.
> 
> By "standalone client", do you mean that I could use SolrJ on a separate
> server where no Solr instance is running? i.e. use the client to
> remotely connect to SolrCloud.
> 
> By the way, the most popular Python client, pysolr, seems to support
> SolrCloud mode. [1]
> 
>> For full redundancy with HTTP-only clients you'll need a virtual IP
>> address that can be shared among the servers, and have a load balancer
>> listening on the virtual IP.  Setting that up is done with software
>> other than Solr and ZK, so it's not on-topic for this mailing list. 
>> Depending on the capabilities of the third server, it could be the
>> primary for load-balancing as well as the third machine for ZK. 
>> That's what I would do with limited resources.
> 
> I think I will stick to ZooKeeper-aware clients if I choose to go the
> SolrCloud route. Using the SolrJ "CloudSolrClient" looks like a much
> simpler solution than setting up all the infrastructure required for
> achieving high availability with HTTP-only clients.
> 
> 
>  [1]: https://github.com/django-haystack/pysolr

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Re: How to run Solr on two servers for redundancy

Posted by Sam Lee <sa...@yahoo.com.INVALID>.

On 2022/03/13 22:22:48 Shawn Heisey wrote:
> Zookeeper has fairly low system requirements compared to Solr, so using
> a third machine with lower specs to just run the tie-breaker ZK is a
> good way to go.
>
> Note that you'll only have full redundancy at the client level with that
> setup if your client is ZK-aware.  The only Solr client I know about
> that's ZK aware is the Java client, which is part of Solr itself as well
> as being a standalone client.

Thank you for bringing this potential issue to my attention.

By "standalone client", do you mean that I could use SolrJ on a separate
server where no Solr instance is running? i.e. use the client to
remotely connect to SolrCloud.

By the way, the most popular Python client, pysolr, seems to support
SolrCloud mode. [1]

> For full redundancy with HTTP-only clients you'll need a virtual IP
> address that can be shared among the servers, and have a load balancer
> listening on the virtual IP.  Setting that up is done with software
> other than Solr and ZK, so it's not on-topic for this mailing list. 
> Depending on the capabilities of the third server, it could be the
> primary for load-balancing as well as the third machine for ZK. 
> That's what I would do with limited resources.

I think I will stick to ZooKeeper-aware clients if I choose to go the
SolrCloud route. Using the SolrJ "CloudSolrClient" looks like a much
simpler solution than setting up all the infrastructure required for
achieving high availability with HTTP-only clients.

  [1]: https://github.com/django-haystack/pysolr

Re: How to run Solr on two servers for redundancy

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/13/2022 9:23 AM, Sam Lee wrote:
> How do I run Apache Solr on two servers such that I will still be able
> to index and query even if one of the servers is taken offline? In other
> words, I am looking for high availability with automatic failover.
>
> I have looked at SolrCloud. However, it uses Apache ZooKeeper which
> requires at least 3 servers to be able to tolerate the failure of 1
> server. Are there alternative methods?
>
> I only have two servers that are able to run Solr. Supposing I have a
> third low-spec server, is it a good idea to run Solr (SolrCloud) on only
> two of the three servers, while running ZooKeeper on all three servers?
> However, this looks like a waste because the third server would
> essentially be doing nothing. Is there a better method with standalone
> Solr?

Zookeeper has fairly low system requirements compared to Solr, so using 
a third machine with lower specs to just run the tie-breaker ZK is a 
good way to go.

Note that you'll only have full redundancy at the client level with that 
setup if your client is ZK-aware.  The only Solr client I know about 
that's ZK aware is the Java client, which is part of Solr itself as well 
as being a standalone client.  For full redundancy with HTTP-only 
clients you'll need a virtual IP address that can be shared among the 
servers, and have a load balancer listening on the virtual IP.  Setting 
that up is done with software other than Solr and ZK, so it's not 
on-topic for this mailing list.  Depending on the capabilities of the 
third server, it could be the primary for load-balancing as well as the 
third machine for ZK.  That's what I would do with limited resources.

Thanks,
Shawn

Re: How to run Solr on two servers for redundancy

Posted by dmitri maziuk <dm...@gmail.com>.

On 2022-03-13 8:07 PM, Shawn Heisey wrote:

> ...  Trying to shuffle an iSCSI volume between hosts sounds like a 
> very brittle setup to me.

I haven't done it with iSCSI. That setup is stable enough with DRBD and 
it's stable enough with ZFS on dual-ported SAS drives, so I really don't 
see why iSCSI would be any less stable -- obviously given the good 
enough "i" part.

But I was thinking more of a docker infra where data lives in iSCSI 
volumes anyway. If I had to do it on a bare-metal pair of servers, I'd 
probably go CARP and I forget what BSD's equivalent of DRBD is, but my 
first choice would be docker.

Dima

Re: How to run Solr on two servers for redundancy

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/13/2022 5:27 PM, dmitri maziuk wrote:
> If you do the former, you could probably put the index on an iscsi 
> volume and have it remounted on the standby host/container as part of 
> the failover. If you do the latter, might as well do what Shawn said 
> (but then of course "now you have two problems" because the proxy host 
> can fail too).

My preferred solution for a minimalist SolrCloud install would be to run 
ucarp or pacemaker on all 3 hosts, handling a virtual IP address.  If 
it's pacemaker, have it also manage which machine is running haproxy for 
load balancing.  If it's ucarp, haproxy would be always running on all 3 
hosts, because ucarp is a very simple program.  Two of the hosts would 
run ZK and Solr in addition to all that, the third machine would run ZK 
but not Solr.  The third machine would be the preferred host for the IP 
address and haproxy, but those would also be able to transfer to one of 
the other two if that host goes down.

Each Solr instance would have a full copy of the data in TLOG replicas.  
That way there would be two completely identical copies of all index 
data.  Trying to shuffle an iSCSI volume between hosts sounds like a 
very brittle setup to me.

In my experience, pacemaker and haproxy are very stable when correctly 
configured.

Thanks,
Shawn

Re: How to run Solr on two servers for redundancy

Posted by dmitri maziuk <dm...@gmail.com>.

On 2022-03-13 10:23 AM, Sam Lee wrote:

> However, this looks like a waste because the third server would
> essentially be doing nothing. Is there a better method with standalone
> Solr?

You'll need to either fail over your "cluster IP" or have the 3rd host 
proxying http requests to active solr node.

If you do the former, you could probably put the index on an iscsi 
volume and have it remounted on the standby host/container as part of 
the failover. If you do the latter, might as well do what Shawn said 
(but then of course "now you have two problems" because the proxy host 
can fail too).

Dima