You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Hastings <ha...@gmail.com> on 2017/12/15 16:56:32 UTC

legacy replication

So i dont step on the other thread, I want to be assured whether or not
legacy master/slave/repeater replication will continue to be supported in
future solr versions.  our infrastructure is set up for this and all the HA
redundancies that solrcloud provides we have already spend a lot of time
and resources with very expensive servers to handle solr in standalone
mode.

thanks.
-David

Re: legacy replication

Posted by Erick Erickson <er...@gmail.com>.
Yeah, much as I love SolrCloud (and make most of my living working
with it), it does have its complexities.

My rule of thumb is that you really want to consider SolrCloud when
you start having to shard or need NRT
searching.

You trade the complexity of maintaining your own sharding etc. for the
complexity of  ZooKeeper for the former,
and can't do the latter with master/slave, so whatever floats your boat ;)

About ZooKeeper: with a system your size, you would absolutely _not_
need any more than three. In fact,
if you were willing to accept that your one ZooKeeper going down would
prevent updates from happening,
you could run with just one. And your ZK machines could be cheap
boxes, they don't need all that much
processing power.

That said, if it's just a matter of plopping in Solr 7x (or whatever)
over your existing infrastructure that's been
running for years, I really can't say you should move to SolrCloud....

Best,
Erick

On Sat, Dec 16, 2017 at 9:36 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 12/15/2017 12:12 PM, David Hastings wrote:
>>
>> Also the complexity of adding another 3
>> or more machines just to do nothing but ZK stuff was getting out of hand.
>
>
> You can run ZK on the same machines that are running Solr.  The only strong
> recommendation that I would make is that it should be a completely separate
> process, not embedded within Solr.  The ZK process is unlikely to need much
> of a heap, unless your ZK database is huge.
>
> It can also be useful to have ZK's data on separate disks from other things
> on the machine, but this is not usually necessary.
>
> Thanks,
> Shawn

Re: legacy replication

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/15/2017 12:12 PM, David Hastings wrote:
> Also the complexity of adding another 3
> or more machines just to do nothing but ZK stuff was getting out of hand.

You can run ZK on the same machines that are running Solr.  The only 
strong recommendation that I would make is that it should be a 
completely separate process, not embedded within Solr.  The ZK process 
is unlikely to need much of a heap, unless your ZK database is huge.

It can also be useful to have ZK's data on separate disks from other 
things on the machine, but this is not usually necessary.

Thanks,
Shawn

Re: legacy replication

Posted by David Hastings <ha...@gmail.com>.
Understandable.  Right now we have a large set up of solr 5.x servers that
has been doing great for years.  But the time to upgrade has come, with
some things that we want that are not available in the 5.x branch.  I
really like legacy ( master/slave) replication, for the reasons you stated,
but also the fact that the cloud set up seems perfect, if you have a
handful of cheap machines around.  Our production set up has 1 indexer,
which has a 5 minute polling slave, and on releases we have 3 searching
servers that poll manually.   Tjhing is, these machines have over 32 cores
and over 200gb of ram with 2TB SSDs, each, these were not cheap and are
pretty fast with standalone solr.  Also the complexity of adding another 3
or more machines just to do nothing but ZK stuff was getting out of hand.
if its not broken, im not about to fix it

In any case im glad to hear legacy replication will stay.
Thanks,
-Dave

On Fri, Dec 15, 2017 at 1:15 PM, Walter Underwood <wu...@wunderwood.org>
wrote:

> I love legacy replication. It is simple and bulletproof. Loose coupling
> for the win! We only run Solr Cloud when we need sharding or NRT search.
> Loose coupling is a very, very good thing in distributed systems.
>
> Adding a replica (new slave) is trivial. Clone an existing one. This makes
> horizontal scaling so easy. We still haven’t written the procedure and
> scripts for scaling our Solr Cloud cluster. Last time, it was 100% manual
> through the admin UI.
>
> Setting up a Zookeeper ensemble isn’t as easy as it should be. We tried to
> set up a five node ensemble with ZK 3.4.6 and finally gave up after two
> weeks because it was blocking the release. We are using the three node
> 3.4.5 ensemble that had been set up for something else a couple of years
> earlier. I’ve had root on Unix since 1981 and have been running TCP/IP
> since 1983, so I should have been able to figure this out.
>
> We’ve had some serious prod problems with the Solr Cloud cluster, like
> cores stuck in a permanent recovery loop. I finally manually deleted that
> core and created a new one. Ugly.
>
> Even starting Solr Cloud processes is confusing. It took a while to figure
> out they were all joining as the same host (no, I don’t know why), so now
> we start them as: solr start -cloud -h `hostname`
>
> Keeping configs under source control and deploying them isn’t easy. I’m
> not going to install Solr on the Jenkins executor just so it can deploy,
> that is weird and kind of a chicken and egg thing. I ended up writing a
> Python program to get the ZK address from the cluster, use kazoo to load
> directly to ZK, then tell the cluster to reload. Both with that and with
> the provided ZK tools I ran into so much undocumented stuff. What is
> linking? How to the file config directories map to the ZK config
> directories? And so on.
>
> The lack of a thread pool for requests is a very serious problem. If our
> 6.5.1 cluster gets overloaded, it creates 4000 threads, runs out of memory
> and fails. That is just wrong. With earlier versions of Solr, it would get
> slower and slower, but recover gracefully.
>
> Converting a slave into a master is easy. We use this in the config file:
>
>    <lst name="master">
>       <str name="enable">${enable.master:false}</str>
>   …
>   <lst name="slave">
>      <str name="enable">${textbooks.enable.slave:false}</str>
>
> And this at startup (slave config shown): -Denable.master=false
> -Denable.slave=true
>
> Change the properties and restart.
>
> Our 6.5.1 cluster is faster than the non-sharded 4.10.4 master/slave
> cluster, but I’m not happy with the stability in prod. We’ve had more
> search outages in the past six months than we had in the previous four
> years. I’ve had Solr in prod since version 1.2, and this is the first time
> it has really embarrassed me.
>
> There are good things. Search is faster, we’re handling double the query
> volume with 3X the docs.
>
> Sorry for the rant, but it has not been a good fall semester for our
> students (customers).
>
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 15, 2017, at 9:46 AM, Erick Erickson <er...@gmail.com>
> wrote:
> >
> > There's pretty much zero chance that it'll go away, too much current
> > and ongoing functionality that depends on it.
> >
> > 1> old-style replication has always been used for "full sync" in
> > SolrCloud when peer sync can't be done.
> >
> > 2> The new TLOG and PULL replica types are a marriage of old-style
> > master/slave and SolrCloud. In particular a PULL replica is
> > essentially an old-style slave. A TLOG replica is an old-style slave
> > that also maintains a transaction log so it can take over leadership
> > if necessary.
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
> > <ha...@gmail.com> wrote:
> >> So i dont step on the other thread, I want to be assured whether or not
> >> legacy master/slave/repeater replication will continue to be supported
> in
> >> future solr versions.  our infrastructure is set up for this and all
> the HA
> >> redundancies that solrcloud provides we have already spend a lot of time
> >> and resources with very expensive servers to handle solr in standalone
> >> mode.
> >>
> >> thanks.
> >> -David
>
>

Re: legacy replication

Posted by Walter Underwood <wu...@wunderwood.org>.
I love legacy replication. It is simple and bulletproof. Loose coupling for the win! We only run Solr Cloud when we need sharding or NRT search. Loose coupling is a very, very good thing in distributed systems.

Adding a replica (new slave) is trivial. Clone an existing one. This makes horizontal scaling so easy. We still haven’t written the procedure and scripts for scaling our Solr Cloud cluster. Last time, it was 100% manual through the admin UI.

Setting up a Zookeeper ensemble isn’t as easy as it should be. We tried to set up a five node ensemble with ZK 3.4.6 and finally gave up after two weeks because it was blocking the release. We are using the three node 3.4.5 ensemble that had been set up for something else a couple of years earlier. I’ve had root on Unix since 1981 and have been running TCP/IP since 1983, so I should have been able to figure this out.

We’ve had some serious prod problems with the Solr Cloud cluster, like cores stuck in a permanent recovery loop. I finally manually deleted that core and created a new one. Ugly.

Even starting Solr Cloud processes is confusing. It took a while to figure out they were all joining as the same host (no, I don’t know why), so now we start them as: solr start -cloud -h `hostname`

Keeping configs under source control and deploying them isn’t easy. I’m not going to install Solr on the Jenkins executor just so it can deploy, that is weird and kind of a chicken and egg thing. I ended up writing a Python program to get the ZK address from the cluster, use kazoo to load directly to ZK, then tell the cluster to reload. Both with that and with the provided ZK tools I ran into so much undocumented stuff. What is linking? How to the file config directories map to the ZK config directories? And so on.

The lack of a thread pool for requests is a very serious problem. If our 6.5.1 cluster gets overloaded, it creates 4000 threads, runs out of memory and fails. That is just wrong. With earlier versions of Solr, it would get slower and slower, but recover gracefully.

Converting a slave into a master is easy. We use this in the config file:

   <lst name="master">
      <str name="enable">${enable.master:false}</str>
  …
  <lst name="slave">
     <str name="enable">${textbooks.enable.slave:false}</str>

And this at startup (slave config shown): -Denable.master=false -Denable.slave=true

Change the properties and restart.

Our 6.5.1 cluster is faster than the non-sharded 4.10.4 master/slave cluster, but I’m not happy with the stability in prod. We’ve had more search outages in the past six months than we had in the previous four years. I’ve had Solr in prod since version 1.2, and this is the first time it has really embarrassed me.

There are good things. Search is faster, we’re handling double the query volume with 3X the docs.

Sorry for the rant, but it has not been a good fall semester for our students (customers).

Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 15, 2017, at 9:46 AM, Erick Erickson <er...@gmail.com> wrote:
> 
> There's pretty much zero chance that it'll go away, too much current
> and ongoing functionality that depends on it.
> 
> 1> old-style replication has always been used for "full sync" in
> SolrCloud when peer sync can't be done.
> 
> 2> The new TLOG and PULL replica types are a marriage of old-style
> master/slave and SolrCloud. In particular a PULL replica is
> essentially an old-style slave. A TLOG replica is an old-style slave
> that also maintains a transaction log so it can take over leadership
> if necessary.
> 
> Best,
> Erick
> 
> On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
> <ha...@gmail.com> wrote:
>> So i dont step on the other thread, I want to be assured whether or not
>> legacy master/slave/repeater replication will continue to be supported in
>> future solr versions.  our infrastructure is set up for this and all the HA
>> redundancies that solrcloud provides we have already spend a lot of time
>> and resources with very expensive servers to handle solr in standalone
>> mode.
>> 
>> thanks.
>> -David


Re: legacy replication

Posted by Erick Erickson <er...@gmail.com>.
There's pretty much zero chance that it'll go away, too much current
and ongoing functionality that depends on it.

1> old-style replication has always been used for "full sync" in
SolrCloud when peer sync can't be done.

2> The new TLOG and PULL replica types are a marriage of old-style
master/slave and SolrCloud. In particular a PULL replica is
essentially an old-style slave. A TLOG replica is an old-style slave
that also maintains a transaction log so it can take over leadership
if necessary.

Best,
Erick

On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
<ha...@gmail.com> wrote:
> So i dont step on the other thread, I want to be assured whether or not
> legacy master/slave/repeater replication will continue to be supported in
> future solr versions.  our infrastructure is set up for this and all the HA
> redundancies that solrcloud provides we have already spend a lot of time
> and resources with very expensive servers to handle solr in standalone
> mode.
>
> thanks.
> -David