You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2017/06/02 12:57:29 UTC

Re: Spread SolrCloud across two locations

On 5/29/2017 8:57 AM, Jan Høydahl wrote:
> And if you start all three in DC1, you have 3+3 voting, what would
> then happen? Any chance of state corruption?
>
> I believe that my solution isolates manual change to two ZK nodes in
> DC2, while your requires config change to 1 in DC2 and manual
> start/stop of 1 in DC1.

I took the scenario to the zookeeper user list.  Here's the thread:

http://zookeeper-user.578899.n2.nabble.com/Yet-another-quot-two-datacenter-quot-discussion-td7583106.html

I'm not completely clear on what they're saying, but here's what I think
it means:  Dealing with a loss of dc1 by reconfiguring ZK servers in DC2
might work, or it might crash and burn once connectivity to DC1 is restored.

> Well, that’s not up to me to decide, it’s the customer environment
> that sets the constraints, they currently have 2 independent geo
> locations. And Solr is just a dependency of some other app they need
> to install, so doubt that they are very happy to start adding racks or
> independent power/network for this alone. Of course, if they already
> have such redundancy within one of the DCs, placing a 3rd ZK there is
> an ideal solution with probably good enough HA. If not, I’m looking
> for the 2nd best low-friction approach with software-only.

Even if all goes well with scripted reconfiguration of DC2, I don't
think I'd want to try and automate it, because of the chance for a brief
outage to trigger it.  Without automation, if the failure happened at
just the wrong moment, it could be a while before anyone notices, and it
might be hours after it gets noticed before relevant personnel are in a
position to run the reconfiguration script on DC2, during which you'd
have a read-only SolrCloud.

Frequently search is such a critical part of of a web applications that
if it doesn't work, there IS no web application.  That certainly
describes the systems that use the Solr installations that I manage. 
For that kind of application, damage to reputation caused by a couple of
hours where the website doesn't get any updates might be MUCH more
expensive than the monthly cost for a virtual private server from a
hosting company.

Thanks,
Shawn


Re: Spread SolrCloud across two locations

Posted by Jan Høydahl <ja...@cominvent.com>.
Thanks for checking Shawn.

So rolling ZK restart is bad, and ZK nodes with different config is bad,
Guess this could still work if
* All ZK config changes are done by stopping ALL zk nodes
* All config changes are done controlled and manual so DC1 don’t come up by surprise with old config

PS: I was not proposing an *automatic* triggering of a reconfiguration script, but rather to have a script that someone runs manually in order to make sure one does not mess up the reconfiguration

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. jun. 2017 kl. 14.57 skrev Shawn Heisey <ap...@elyograg.org>:
> 
> On 5/29/2017 8:57 AM, Jan Høydahl wrote:
>> And if you start all three in DC1, you have 3+3 voting, what would
>> then happen? Any chance of state corruption?
>> 
>> I believe that my solution isolates manual change to two ZK nodes in
>> DC2, while your requires config change to 1 in DC2 and manual
>> start/stop of 1 in DC1.
> 
> I took the scenario to the zookeeper user list.  Here's the thread:
> 
> http://zookeeper-user.578899.n2.nabble.com/Yet-another-quot-two-datacenter-quot-discussion-td7583106.html
> 
> I'm not completely clear on what they're saying, but here's what I think
> it means:  Dealing with a loss of dc1 by reconfiguring ZK servers in DC2
> might work, or it might crash and burn once connectivity to DC1 is restored.
> 
>> Well, that’s not up to me to decide, it’s the customer environment
>> that sets the constraints, they currently have 2 independent geo
>> locations. And Solr is just a dependency of some other app they need
>> to install, so doubt that they are very happy to start adding racks or
>> independent power/network for this alone. Of course, if they already
>> have such redundancy within one of the DCs, placing a 3rd ZK there is
>> an ideal solution with probably good enough HA. If not, I’m looking
>> for the 2nd best low-friction approach with software-only.
> 
> Even if all goes well with scripted reconfiguration of DC2, I don't
> think I'd want to try and automate it, because of the chance for a brief
> outage to trigger it.  Without automation, if the failure happened at
> just the wrong moment, it could be a while before anyone notices, and it
> might be hours after it gets noticed before relevant personnel are in a
> position to run the reconfiguration script on DC2, during which you'd
> have a read-only SolrCloud.
> 
> Frequently search is such a critical part of of a web applications that
> if it doesn't work, there IS no web application.  That certainly
> describes the systems that use the Solr installations that I manage. 
> For that kind of application, damage to reputation caused by a couple of
> hours where the website doesn't get any updates might be MUCH more
> expensive than the monthly cost for a virtual private server from a
> hosting company.
> 
> Thanks,
> Shawn
>