You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by mtn search <se...@gmail.com> on 2022/03/01 23:02:02 UTC

SolrCloud HOT|HOT HA Arch

Hello,

My team is looking to deploy Solr 8 SolrCloud on two on-prem datacenters
via EKS.  We are considering a HOT | HOT HA architecture between the data
centers where data would be indexed (duplicated) to SolrCloud instances in
both datacenters. Then via service (to be worked out) queries could go to
either datacenter.

I believe one of the challenges will be keeping the SolrCloud instances
(holding the same data) in sync.

I am curious if others have tried this and are willing to share any tips,
lessons learned, or things we should consider.

Thanks,
Matt

Re: SolrCloud HOT|HOT HA Arch

Posted by Walter Underwood <wu...@wunderwood.org>.
We create “search feeds”, which are S3 files with one JSON object per line. Documents going to Solr go into a feed file first. Periodically, the files are fetched and loaded into Solr.

S3 is cross-region, so we could easily use this for multiple hot search clusters. More often, we’ve used it for major version upgrades. Make a new cluster with version 8, feed both the Solr 6 and Solr 8 clusters independently from the feed files. After traffic is moved over, stop feeding the Solr 6 cluster and recycle the machines.

For disaster recovery, we’d rebuild the cluster (Terraform), then run the loader.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 1, 2022, at 3:54 PM, Matt Kuiper <ku...@gmail.com> wrote:
> 
> Thanks Anshum, Dima!  Yes, I figure this approach will be quite challenging
> to implement, and may not be worth the cost.
> 
> Anshum,
> 
> I had not thought of versioning (
> https://solr.apache.org/guide/8_2/updating-parts-of-documents.html#document-centric-versioning-constraints),
> but will consider it.  Yes, some of our updates are Atomic updates.
> 
> Yes, initial thinking is using a single "queue" of updates where multiple
> instances (associated to a particular SorlCloud instance) of the same
> indexing service will consume from the queue and index to their associated
> SolrCloud instance.
> 
> I will take a look at your proposal!
> 
> Thanks again,
> 
> Matt
> 
> On Tue, Mar 1, 2022 at 4:12 PM Anshum Gupta <an...@anshumgupta.net> wrote:
> 
>> Hi Matt,
>> 
>> I'll start by saying that this has been long due at my end.
>> 
>> There are a multitude of challenges with a hot-hot architecture involving
>> multiple SolrCloud clusters. An important question here is if you are going
>> to manage the versioning yourself. Also, if your updates would ever
>> overwrite data. Here's an initial proposal for something along those lines
>> (but doesn't support an unversioned hot-hot setup w/ document edits) -
>> 
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-13%3A+Cross+Data+Center+Replication
>> 
>> Hot-Hot setups are really complex and there are a few ways I've handled (or
>> seen them being handled.
>> 1. The best way here is to either have externally versioned documents sent
>> to Solr clusters or
>> 2. rely on a single point of entry i.e. updates always go to a queuing
>> service for instance and then have an application that's responsible for
>> consuming from this (queue?).
>> 
>> -Anshum
>> 
>> On Tue, Mar 1, 2022 at 3:02 PM mtn search <se...@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> My team is looking to deploy Solr 8 SolrCloud on two on-prem datacenters
>>> via EKS.  We are considering a HOT | HOT HA architecture between the data
>>> centers where data would be indexed (duplicated) to SolrCloud instances
>> in
>>> both datacenters. Then via service (to be worked out) queries could go to
>>> either datacenter.
>>> 
>>> I believe one of the challenges will be keeping the SolrCloud instances
>>> (holding the same data) in sync.
>>> 
>>> I am curious if others have tried this and are willing to share any tips,
>>> lessons learned, or things we should consider.
>>> 
>>> Thanks,
>>> Matt
>>> 
>> 
>> 
>> --
>> Anshum Gupta
>> 


Re: SolrCloud HOT|HOT HA Arch

Posted by Matt Kuiper <ku...@gmail.com>.
Thanks Anshum, Dima!  Yes, I figure this approach will be quite challenging
to implement, and may not be worth the cost.

Anshum,

I had not thought of versioning (
https://solr.apache.org/guide/8_2/updating-parts-of-documents.html#document-centric-versioning-constraints),
but will consider it.  Yes, some of our updates are Atomic updates.

Yes, initial thinking is using a single "queue" of updates where multiple
instances (associated to a particular SorlCloud instance) of the same
indexing service will consume from the queue and index to their associated
SolrCloud instance.

I will take a look at your proposal!

Thanks again,

Matt

On Tue, Mar 1, 2022 at 4:12 PM Anshum Gupta <an...@anshumgupta.net> wrote:

> Hi Matt,
>
> I'll start by saying that this has been long due at my end.
>
> There are a multitude of challenges with a hot-hot architecture involving
> multiple SolrCloud clusters. An important question here is if you are going
> to manage the versioning yourself. Also, if your updates would ever
> overwrite data. Here's an initial proposal for something along those lines
> (but doesn't support an unversioned hot-hot setup w/ document edits) -
>
> https://cwiki.apache.org/confluence/display/SOLR/SIP-13%3A+Cross+Data+Center+Replication
>
> Hot-Hot setups are really complex and there are a few ways I've handled (or
> seen them being handled.
> 1. The best way here is to either have externally versioned documents sent
> to Solr clusters or
> 2. rely on a single point of entry i.e. updates always go to a queuing
> service for instance and then have an application that's responsible for
> consuming from this (queue?).
>
> -Anshum
>
> On Tue, Mar 1, 2022 at 3:02 PM mtn search <se...@gmail.com> wrote:
>
> > Hello,
> >
> > My team is looking to deploy Solr 8 SolrCloud on two on-prem datacenters
> > via EKS.  We are considering a HOT | HOT HA architecture between the data
> > centers where data would be indexed (duplicated) to SolrCloud instances
> in
> > both datacenters. Then via service (to be worked out) queries could go to
> > either datacenter.
> >
> > I believe one of the challenges will be keeping the SolrCloud instances
> > (holding the same data) in sync.
> >
> > I am curious if others have tried this and are willing to share any tips,
> > lessons learned, or things we should consider.
> >
> > Thanks,
> > Matt
> >
>
>
> --
> Anshum Gupta
>

Re: SolrCloud HOT|HOT HA Arch

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Matt,

I'll start by saying that this has been long due at my end.

There are a multitude of challenges with a hot-hot architecture involving
multiple SolrCloud clusters. An important question here is if you are going
to manage the versioning yourself. Also, if your updates would ever
overwrite data. Here's an initial proposal for something along those lines
(but doesn't support an unversioned hot-hot setup w/ document edits) -
https://cwiki.apache.org/confluence/display/SOLR/SIP-13%3A+Cross+Data+Center+Replication

Hot-Hot setups are really complex and there are a few ways I've handled (or
seen them being handled.
1. The best way here is to either have externally versioned documents sent
to Solr clusters or
2. rely on a single point of entry i.e. updates always go to a queuing
service for instance and then have an application that's responsible for
consuming from this (queue?).

-Anshum

On Tue, Mar 1, 2022 at 3:02 PM mtn search <se...@gmail.com> wrote:

> Hello,
>
> My team is looking to deploy Solr 8 SolrCloud on two on-prem datacenters
> via EKS.  We are considering a HOT | HOT HA architecture between the data
> centers where data would be indexed (duplicated) to SolrCloud instances in
> both datacenters. Then via service (to be worked out) queries could go to
> either datacenter.
>
> I believe one of the challenges will be keeping the SolrCloud instances
> (holding the same data) in sync.
>
> I am curious if others have tried this and are willing to share any tips,
> lessons learned, or things we should consider.
>
> Thanks,
> Matt
>


-- 
Anshum Gupta

Re: SolrCloud HOT|HOT HA Arch

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-01 5:02 PM, mtn search wrote:

> I am curious if others have tried this and are willing to share any tips,
> lessons learned, or things we should consider.

Not specific to Solr, but it's infinitely easier to do active-passive HA 
than active-active (if that's what you mean buy hot-hot).

Dima